How to remove duplicates in transformer stage? in parallel
mode

Answers were Sorted based on User's Feedback



How to remove duplicates in transformer stage? in parallel mode..

Answer / kiran

partition the data by key and sort the data and click on
unique value. This will automatically delete duplicate
data.

Is This Answer Correct ?    20 Yes 3 No

How to remove duplicates in transformer stage? in parallel mode..

Answer / praveen sarva

STEP 1) TRANSFORMER STAGE PROPERTIES--> ADVANCED -->
EXECUTION MODE ---> PARLLEL

STEP 2) TRANSFORMER STAGE PROPERTIES --> INPUT -->
PARTITIONING--> PARTITION TYPE --> HASH ---> ENABLE SORT ---
> ENABLE UNIQUE

Simple u will get non duplicate records....

Is This Answer Correct ?    11 Yes 0 No

How to remove duplicates in transformer stage? in parallel mode..

Answer / kiran

i am not sure who marked my answer as wrong. Can you please
be responsible enough to state why its wrong?

Is This Answer Correct ?    1 Yes 0 No

How to remove duplicates in transformer stage? in parallel mode..

Answer / satya

run u r job in sequencial mode and sort the source data
then play with stage variable's in Transformer.

because in parallel mode data is partioned .

Is This Answer Correct ?    1 Yes 1 No

How to remove duplicates in transformer stage? in parallel mode..

Answer / prasad

Take 2 Stage variables in transformer stage

sV1 =Column_Name
sV2 =if Column_Name=sV1 Then 0 Else 1

put it constraint sV2=1 (only will get unique records)

if u want duplicates sV2=0

Is This Answer Correct ?    0 Yes 1 No

How to remove duplicates in transformer stage? in parallel mode..

Answer / santhosh

go to transformer stage properties->input->define any kind of partition over there and enable perform sort check box....

n also define the particular column need to be sorted..

it gives the sorted column out view...

Is This Answer Correct ?    1 Yes 6 No

Post New Answer

More Data Stage Interview Questions

if the source file is CID,CCODE,CONNDATE,CREATEDBY 0000000224,1000,20060601,CURA 0000000224,2000,20050517,AFGA 0000000224,3000,20080601,TUNE 0000000225,1000,20020601,CURA 0000000225,2000,20050617,AFGA 0000000225,3000,20080601,TONE AND TARGET is oracle following are the validations cid loaded with unique records leading zeors has to be deleted while loading cid in target load only customer who got early connected to company conn_date should be loaded into oracle date format cid datatype is varchar2 in target conn_date is data datatype ccode is varchar2 0000000224,1000,20060601,CURA 0000000224,1000,20060601,CURA

2 Answers  


what is the custome stage in datastage? how can we impliment that one? plz tell me

0 Answers   Accenture,


To see hidden files in LINIX?

0 Answers   CTS,


What is the use of surrogate key stage?

2 Answers  


Describe stream connector?

0 Answers  


I have a few records just I want to store data in to targets cycling way how?

0 Answers   Polaris,


Define project in datastage?

0 Answers  


iam new to datastage...now i want to know what are fact tables, dimension tables in bank domain...if any body knows plz tell me asap..

4 Answers   Wipro,


what is combinability and non combinability?

2 Answers  


what are the devoleper roles in real time? plz tell i am new to datastage....

0 Answers   Mphasis,


if we using two sources having same meta data and how to check the data in two sources is same or not? and if the data is not same i want to abort the job ?how we can do this?

0 Answers   Wipro,


I am having two tables called MASTER and DETAIL. I want to insert records to both tables. But one condition is that whenever the insert for MASTER table is success then only the records will inserted into the DETAIL table, otherwise abort the job. How can u design this job?

1 Answers   TCS,


Categories