How to remove duplicates in transformer stage? in parallel
mode
Answers were Sorted based on User's Feedback
Answer / kiran
partition the data by key and sort the data and click on
unique value. This will automatically delete duplicate
data.
| Is This Answer Correct ? | 20 Yes | 3 No |
Answer / praveen sarva
STEP 1) TRANSFORMER STAGE PROPERTIES--> ADVANCED -->
EXECUTION MODE ---> PARLLEL
STEP 2) TRANSFORMER STAGE PROPERTIES --> INPUT -->
PARTITIONING--> PARTITION TYPE --> HASH ---> ENABLE SORT ---
> ENABLE UNIQUE
Simple u will get non duplicate records....
| Is This Answer Correct ? | 11 Yes | 0 No |
Answer / kiran
i am not sure who marked my answer as wrong. Can you please
be responsible enough to state why its wrong?
| Is This Answer Correct ? | 1 Yes | 0 No |
Answer / satya
run u r job in sequencial mode and sort the source data
then play with stage variable's in Transformer.
because in parallel mode data is partioned .
| Is This Answer Correct ? | 1 Yes | 1 No |
Answer / prasad
Take 2 Stage variables in transformer stage
sV1 =Column_Name
sV2 =if Column_Name=sV1 Then 0 Else 1
put it constraint sV2=1 (only will get unique records)
if u want duplicates sV2=0
| Is This Answer Correct ? | 0 Yes | 1 No |
Answer / santhosh
go to transformer stage properties->input->define any kind of partition over there and enable perform sort check box....
n also define the particular column need to be sorted..
it gives the sorted column out view...
| Is This Answer Correct ? | 1 Yes | 6 No |
in sequtial file 2 columns avaliable, i want only one column load the target. for this we can do by modify and copy stage. But here when using modify stage (in property drop column1) until it is ok. if target is data set How to view the data. with out using data management. what is the reason for this. if any body know this answer plz tel me. thanks.
What are the partitioning techniques available in link partitioner?
if we using two sources having same meta data and how to check the data in two sources is same or not? and if the data is not same i want to abort the job ?how we can do this?
what are fact tables and dimension tables? give example assuming one table.
How can remove duplicates in a file using UNIX?
Nls stands for what in datastage?
What's the Main Function of the Staging area in DWH
i have source like balance,drawtime 20000, 8.30 50000,10.20 3000,4.00 i want target like this balance,drawtime 20000, 20.30 50000,22.20 3000,16.00
what is the difference between lookup stage reject link and merge stage reject link in datastage Parallel jobs? interm of output in Merge Reject link and Look Up Reject link ?
my soure table is emp having columns sal,deptno in the deptno 10,20,30deptno row are there expected out put is min(sal) of 10th deptno,max(sal) of 20th deptno,mean(sal) of 30th deptno using aggregation stage
which memory is used by lookup and join
guys pls tell me where we use sequence jobs exactly in realtime proj explain pls with example.