I have 2 files 1st contains duplicate records only, 2nd file contains Unique records.EX:
File1:
1 subhash 10000
1 subhash 10000
2 raju 20000
2 raju 20000
3 chandra 30000
3 chandra 30000
File2:
1 subhash 10000
5 pawan 15000
7 reddy 25000
3 chandra 30000
Output file:-- capture all the duplicates in both file with count.
1 subhash 10000 3
1 subhash 10000 3
1 subhash 10000 3
2 raju 20000 2
2 raju 20000 2
3 chandra 30000 3
3 chandra 30000 3
3 chandra 30000 3
Answer Posted / subbuchamala
File1,File2====Funnel-----Copy=======1st link AGG, 2nd link JOIN----Filter----OutputFile
1. pass the 2 files to funnel stage and then copy stage.
2. from copy stage 1st link to AGG stage, 2nd link to JOIN stage
3. In AGG stage, Group by Key column say ID, NAME take the count and JOIN based on KEY column
4. Filter on COUNT>1 send the output OutputFile
we get desired output
| Is This Answer Correct ? | 14 Yes | 0 No |
Post New Answer View All Answers
How complex jobs are implemented in datstage to improve performance?
what are the errors,warnings in datastage
what are the devoleper roles in real time? plz tell i am new to datastage....
What are the partitioning techniques available in link partitioner?
How we can covert server job to a parallel job?
Distinguish between informatica & datastage. Which one would you choose and why?
What are transforms and what is the differenece between routines and transforms?
How you can fix the truncated data error in datastage?
DB2 connector> transformer > sequential file Data will be exported into a csv format in a sequential file. This file will be send in a email using a sequence job. Problem here is, how to avoid sending a blank csv file? When I ran the job there are chances that it might return zero records but in the sequence job csv file is going blank. how can I avoid this? thanks
for example You have One Table with 4 Columns (Mgr ID, Department ID, Salary, Employee ID). Can you find out the Average Salary and Number of Employee present per Department and Mgr
What are the different kinds of views available in a datastage director?
How can we perform the 2nd time extraction of client database without accepting the data which is already loaded in first time extraction?
Differentiate between data file and descriptor file?
What is the differentiate between data file and descriptor file?
what are .ctl(control files) files ? how the dataset stage have better performance by this files?