How to remove duplicates in transformer stage? in parallel
mode
Answers were Sorted based on User's Feedback
Answer / kiran
partition the data by key and sort the data and click on
unique value. This will automatically delete duplicate
data.
Is This Answer Correct ? | 20 Yes | 3 No |
Answer / praveen sarva
STEP 1) TRANSFORMER STAGE PROPERTIES--> ADVANCED -->
EXECUTION MODE ---> PARLLEL
STEP 2) TRANSFORMER STAGE PROPERTIES --> INPUT -->
PARTITIONING--> PARTITION TYPE --> HASH ---> ENABLE SORT ---
> ENABLE UNIQUE
Simple u will get non duplicate records....
Is This Answer Correct ? | 11 Yes | 0 No |
Answer / kiran
i am not sure who marked my answer as wrong. Can you please
be responsible enough to state why its wrong?
Is This Answer Correct ? | 1 Yes | 0 No |
Answer / satya
run u r job in sequencial mode and sort the source data
then play with stage variable's in Transformer.
because in parallel mode data is partioned .
Is This Answer Correct ? | 1 Yes | 1 No |
Answer / prasad
Take 2 Stage variables in transformer stage
sV1 =Column_Name
sV2 =if Column_Name=sV1 Then 0 Else 1
put it constraint sV2=1 (only will get unique records)
if u want duplicates sV2=0
Is This Answer Correct ? | 0 Yes | 1 No |
Answer / santhosh
go to transformer stage properties->input->define any kind of partition over there and enable perform sort check box....
n also define the particular column need to be sorted..
it gives the sorted column out view...
Is This Answer Correct ? | 1 Yes | 6 No |
how will u design file watch jobs?
What is the differentiate between data file and descriptor file?
how to add a new records into source?
What is the use of Row generator stage?
what is the use of skid in reporting?
Why fact table is in normal form?
How can one find bugs in job sequence?
what is the diff b/w switch and filter stage in datastage
Hi, i did what you mentioned in the answer, i.e. source- >Transformer -> 3 datasets. Iam able to see the data in datasets but its not sort order... Can you tell how sort the data?? i also checked Hash partition with performsort.
in one scenario source flat file like Fileld1 00122001550056200568 00256002360014500896 00123004560078900258 00147004560025800256 divide each 5 numbers as one column i.e here i need field1 field2 field3 field4 00122 00155 00562 00568 00256 00236 00145 00896 00123 00456 00789 00258 00147 00456 00258 00256 plz help me....
1.i have 5 jobs(1-5),i connect with each other,i want run from 3-5 only how? 2.how to schedual the job in datastage7.5 2? what is the deff bet grip and fgrep command? how do you cleanse the data in your project
Hi Vijay here For Four CPU's how many nodes will required?