Hi, This problem can be solved by creating a job with following stages

I have 2 files 1st contains duplicate records only, 2nd file contains Unique records.EX:
File1:
1 subhash 10000
1 subhash 10000
2 raju 20000
2 raju 20000
3 chandra 30000
3 chandra 30000
File2:
1 subhash 10000
5 pawan 15000
7 reddy 25000
3 chandra 30000
Output file:-- capture all the duplicates in both file with count.
1 subhash 10000 3
1 subhash 10000 3
1 subhash 10000 3
2 raju 20000 2
2 raju 20000 2
3 chandra 30000 3
3 chandra 30000 3
3 chandra 30000 3

Question Posted / ankit gosain

2 Answers
8177 Views
TCS, I also Faced
E-Mail Answers

Answer Posted / ankit gosain

Hi,

This problem can be solved by creating a job with following
stages:

File2 File2
| |
| |
| |
File1-----Funnel----Aggregator----Join----Filter---Tgt_File
|
|
|
File1

1. Funnel both the files (Now you have Unique & Duplicates
records).
2. Aggregate on the basis of any i/p column and mention the
calculation type = Count Rows (say o/p column row_count).
3. Join the aggregated o/p with the i/p file1,2 one the
basis of key & mention the join type = Inner Join.
4. In filter stage, mention the where clause as row_count>1.

If you have further doubt or query, catch me on
ankitgosian@gmail.com

Cheers,
Ankit :)

Is This Answer Correct ?

1 Yes

0 No

Post New Answer View All Answers

Please Help Members By Posting Answers For Below Questions

Define meta stage?

1207

How a routine is called in datastage job?

1121

Differentiate between datastage and informatica?

1188

What are some prerequisites for datastage?

1104

What are the types of containers?

1239

1)How will u implement SCD2 by using surrogate key. 2)What are the disadvantages with surrogate key. 3)How will you handle nulls in your project for the varchar, integer data types. 4)Can I use two fact tables in star schema. 5)3 jobs are running on the 2 nodes after I added one more node so can I compile those jobs to run on three nodes.

4047

Describe stream connector?

1312

what is use of SDR function?

5178

How many types of stage?

1181

What is a quality stage in datastage tool?

1081

What can we do with datastage director?

1242

what is the use of skid in reporting?

2381

Define repository tables in datastage?

1164

in oracle target stage when we use load option and when we use upsert option?

2285

how to abort the job its matain duplicates?

2576