I have 2 files 1st contains duplicate records only, 2nd file contains Unique records.EX:
File1:
1 subhash 10000
1 subhash 10000
2 raju 20000
2 raju 20000
3 chandra 30000
3 chandra 30000
File2:
1 subhash 10000
5 pawan 15000
7 reddy 25000
3 chandra 30000
Output file:-- capture all the duplicates in both file with count.
1 subhash 10000 3
1 subhash 10000 3
1 subhash 10000 3
2 raju 20000 2
2 raju 20000 2
3 chandra 30000 3
3 chandra 30000 3
3 chandra 30000 3
Answers were Sorted based on User's Feedback
Answer / subbuchamala
File1,File2====Funnel-----Copy=======1st link AGG, 2nd link JOIN----Filter----OutputFile
1. pass the 2 files to funnel stage and then copy stage.
2. from copy stage 1st link to AGG stage, 2nd link to JOIN stage
3. In AGG stage, Group by Key column say ID, NAME take the count and JOIN based on KEY column
4. Filter on COUNT>1 send the output OutputFile
we get desired output
| Is This Answer Correct ? | 14 Yes | 0 No |
Answer / ankit gosain
Hi,
This problem can be solved by creating a job with following
stages:
File2 File2
| |
| |
| |
File1-----Funnel----Aggregator----Join----Filter---Tgt_File
|
|
|
File1
1. Funnel both the files (Now you have Unique & Duplicates
records).
2. Aggregate on the basis of any i/p column and mention the
calculation type = Count Rows (say o/p column row_count).
3. Join the aggregated o/p with the i/p file1,2 one the
basis of key & mention the join type = Inner Join.
4. In filter stage, mention the where clause as row_count>1.
If you have further doubt or query, catch me on
ankitgosian@gmail.com
Cheers,
Ankit :)
| Is This Answer Correct ? | 1 Yes | 0 No |
difference between function and procedure...
How to convert alpha Numeric values to alpha using functions?
What are the types of containers and how to create them?
which r the connectors used in san?
job locking methods? How can we unlock the job?
What is orabulk stage?
hi, how would i run job1 then job 3 , then job2 in a sequence of job1 ,job2,job3. Thanks sunitha
What is a delta record? How you will read it?
i have seq file that contents 10 million records load to target any data base.. in that case it takes lot of time for loading..how do performance tuning in that situation...?
i have a small question for datastage, After the desinging (i.e., transformations and loading)part, what we can do?
How many number of reject links merge stage can have?
there are indexes on a table as index1 with col1, col2 index2 with col2 index3 with col1,col2,col3. if i run a query with col1='100' which index will be used and why