I have 2 files 1st contains duplicate records only, 2nd file contains Unique records.EX:
File1:
1 subhash 10000
1 subhash 10000
2 raju 20000
2 raju 20000
3 chandra 30000
3 chandra 30000
File2:
1 subhash 10000
5 pawan 15000
7 reddy 25000
3 chandra 30000
Output file:-- capture all the duplicates in both file with count.
1 subhash 10000 3
1 subhash 10000 3
1 subhash 10000 3
2 raju 20000 2
2 raju 20000 2
3 chandra 30000 3
3 chandra 30000 3
3 chandra 30000 3
Answer Posted / ankit gosain
Hi,
This problem can be solved by creating a job with following
stages:
File2 File2
| |
| |
| |
File1-----Funnel----Aggregator----Join----Filter---Tgt_File
|
|
|
File1
1. Funnel both the files (Now you have Unique & Duplicates
records).
2. Aggregate on the basis of any i/p column and mention the
calculation type = Count Rows (say o/p column row_count).
3. Join the aggregated o/p with the i/p file1,2 one the
basis of key & mention the join type = Inner Join.
4. In filter stage, mention the where clause as row_count>1.
If you have further doubt or query, catch me on
ankitgosian@gmail.com
Cheers,
Ankit :)
| Is This Answer Correct ? | 1 Yes | 0 No |
Post New Answer View All Answers
What is the flow of loading data into fact & dimensional tables?
Source has 2 columns: USA,NewYork INDIA,MUMBAI INDIA,DELHI UDS,CHICAGO INDIA,PUNE i want data in target like below: INDIA,MUMBAI1 INDIA,DELHI2 INDIA,PUNE3 USA,NEWYORK1 USA,CHICAGO2
Explain connectivity between datastage with datasources?
Can anyone tell me a difficult situation who have handled while creating Datastage jobs?
Where the datastage stored his repository?
describe the Steps to confiure a Qlogic switch
What are sequencers?
DB2 connector> transformer > sequential file Data will be exported into a csv format in a sequential file. This file will be send in a email using a sequence job. Problem here is, how to avoid sending a blank csv file? When I ran the job there are chances that it might return zero records but in the sequence job csv file is going blank. how can I avoid this? thanks
Define repository tables in datastage?
client know skid info?
Can you explain how could anyone drop the index before loading the data in target in datastage?
What could be a data source system?
What are the various kinds of containers available in datastage?
Describe routines in datastage? Enlist various types of routines.
What is a datastage job?