I have 2 files 1st contains duplicate records only, 2nd file contains Unique records.EX:
File1:
1 subhash 10000
1 subhash 10000
2 raju 20000
2 raju 20000
3 chandra 30000
3 chandra 30000
File2:
1 subhash 10000
5 pawan 15000
7 reddy 25000
3 chandra 30000
Output file:-- capture all the duplicates in both file with count.
1 subhash 10000 3
1 subhash 10000 3
1 subhash 10000 3
2 raju 20000 2
2 raju 20000 2
3 chandra 30000 3
3 chandra 30000 3
3 chandra 30000 3
Answer Posted / ankit gosain
Hi,
This problem can be solved by creating a job with following
stages:
File2 File2
| |
| |
| |
File1-----Funnel----Aggregator----Join----Filter---Tgt_File
|
|
|
File1
1. Funnel both the files (Now you have Unique & Duplicates
records).
2. Aggregate on the basis of any i/p column and mention the
calculation type = Count Rows (say o/p column row_count).
3. Join the aggregated o/p with the i/p file1,2 one the
basis of key & mention the join type = Inner Join.
4. In filter stage, mention the where clause as row_count>1.
If you have further doubt or query, catch me on
ankitgosian@gmail.com
Cheers,
Ankit :)
| Is This Answer Correct ? | 1 Yes | 0 No |
Post New Answer View All Answers
Which warehouse using in your datawarehouse
how to read 100 records at a time in source a) hw is it fr metadata Same and b) if metadata is nt same?
How will you move hashed file from one location to another location?
how can we validate the flat files using the date in the header and number of records in the flat file? Using both conditions at a time.
how to export or import the jobs in .ISX file
project Steps,hits, Project level HArd things,Solved methods?
CHANGE CAPTURE
What are the steps required to kill the job in Datastage?
In Datastage, how you can fix the truncated data error?
There are two file are there .1st file contains 5 records and 2nd file contain 10 records in target they want 50 records.how can achieve this
describe the Steps to confiure a Qlogic switch
how to implement scd2 in datastage 7.5 with lookup stage
How complex jobs are implemented in datstage to improve performance?
What is active and passive stage?
Can you explain how could anyone drop the index before loading the data in target in datastage?