How to remove duplicates in transformer stage? in parallel
mode

Answers were Sorted based on User's Feedback



How to remove duplicates in transformer stage? in parallel mode..

Answer / kiran

partition the data by key and sort the data and click on
unique value. This will automatically delete duplicate
data.

Is This Answer Correct ?    20 Yes 3 No

How to remove duplicates in transformer stage? in parallel mode..

Answer / praveen sarva

STEP 1) TRANSFORMER STAGE PROPERTIES--> ADVANCED -->
EXECUTION MODE ---> PARLLEL

STEP 2) TRANSFORMER STAGE PROPERTIES --> INPUT -->
PARTITIONING--> PARTITION TYPE --> HASH ---> ENABLE SORT ---
> ENABLE UNIQUE

Simple u will get non duplicate records....

Is This Answer Correct ?    11 Yes 0 No

How to remove duplicates in transformer stage? in parallel mode..

Answer / kiran

i am not sure who marked my answer as wrong. Can you please
be responsible enough to state why its wrong?

Is This Answer Correct ?    1 Yes 0 No

How to remove duplicates in transformer stage? in parallel mode..

Answer / satya

run u r job in sequencial mode and sort the source data
then play with stage variable's in Transformer.

because in parallel mode data is partioned .

Is This Answer Correct ?    1 Yes 1 No

How to remove duplicates in transformer stage? in parallel mode..

Answer / prasad

Take 2 Stage variables in transformer stage

sV1 =Column_Name
sV2 =if Column_Name=sV1 Then 0 Else 1

put it constraint sV2=1 (only will get unique records)

if u want duplicates sV2=0

Is This Answer Correct ?    0 Yes 1 No

How to remove duplicates in transformer stage? in parallel mode..

Answer / santhosh

go to transformer stage properties->input->define any kind of partition over there and enable perform sort check box....

n also define the particular column need to be sorted..

it gives the sorted column out view...

Is This Answer Correct ?    1 Yes 6 No

Post New Answer

More Data Stage Interview Questions

how will u design file watch jobs?

2 Answers  


What is the differentiate between data file and descriptor file?

0 Answers  


how to add a new records into source?

0 Answers  


What is the use of Row generator stage?

2 Answers  


what is the use of skid in reporting?

0 Answers   NTT Data,






Why fact table is in normal form?

0 Answers  


How can one find bugs in job sequence?

0 Answers  


what is the diff b/w switch and filter stage in datastage

2 Answers   Cap Gemini,


Hi, i did what you mentioned in the answer, i.e. source- >Transformer -> 3 datasets. Iam able to see the data in datasets but its not sort order... Can you tell how sort the data?? i also checked Hash partition with performsort.

1 Answers   CGI,


in one scenario source flat file like Fileld1 00122001550056200568 00256002360014500896 00123004560078900258 00147004560025800256 divide each 5 numbers as one column i.e here i need field1 field2 field3 field4 00122 00155 00562 00568 00256 00236 00145 00896 00123 00456 00789 00258 00147 00456 00258 00256 plz help me....

4 Answers  


1.i have 5 jobs(1-5),i connect with each other,i want run from 3-5 only how? 2.how to schedual the job in datastage7.5 2? what is the deff bet grip and fgrep command? how do you cleanse the data in your project

1 Answers   Accenture,


Hi Vijay here For Four CPU's how many nodes will required?

4 Answers   TCS,


Categories