if we take 2 tables(like emp and dept), we use join stage and
how to improve the performance?
Answers were Sorted based on User's Feedback
Answer / kiran
when ever join 2 tables based on key columns if the key
column is numeric ,set modulus,if the key column is non
numeric set hash partition technique.and compare to look up
join give better performance coz join has sort operation
by default.
| Is This Answer Correct ? | 11 Yes | 2 No |
Answer / ashok
above answer has one mistake
i.e join doesn't has sort operation bydefault we explicitly
specify
| Is This Answer Correct ? | 9 Yes | 3 No |
Hi this is Poorna ,
We can Improve the performance on join stage by doing
pre sorting for both left and right data based on
key .Then we can Improve the performance in join stage .
Plz correct me if any mistake in thinking .
| Is This Answer Correct ? | 6 Yes | 1 No |
Answer / rajeshchunduri
in emp and dept tables key column is deptno so it is key
based and datatype for key column is int . At this time we
change partion tech from hash to modulus.
chunduri
| Is This Answer Correct ? | 1 Yes | 1 No |
Answer / professional
Hi,
For the above query to improve the performance based on key columns in emp and dept joins by default sort in datastage for better performance if you have already a sorted data just go for environmental variables and do the operation #APT_Not_SORTDATA option then performance increase automatically...
| Is This Answer Correct ? | 0 Yes | 0 No |
What are constraints and derivations?
how to cleansing data
Out of 4 mill records only 3 mill records are loaded to target and then job aborted. How to load only those 1 mill(not loaded records) for next run. This job is not sequential job, it is stand alone parallel job.What are the possibilities available in datastage8.1?
1.which index is follows the dimensions tables?why? 2.what is the use of trigger in job sequence? 3.what is the mean of optimization? 4.what is the job control?when we use it? what is difference bet batch and sequencer? 6.seq--->seq,seq--->copy--->seq which one is best and efficient?
Hi guys, please design job for this, MY INPUT IS COMPANY,LOCATION IBM,CHENNAI IBM,HYDRABAD IBM,PUNE IBM,BANGLOORE TCS,CHENNAI TCS,MUMBAI TCS,BANGLOORE WIPRO,HYDRABAD WIPRO,CHENNAI HSBC,PUNE MY OUTPUT IS COMPANY,LOCATION,COUNT IBM,chennai,hydrabad,pune,banglore,4 TCS,chennai,mumbai,bangloore,3 WIPRO,hydrabad,chennai,2 HSBC,pune,1 Thanks
if we using two sources having same meta data and how to check the data in two sources is same or not? and if the data is not same i want to abort the job ?how we can do this?
How can we perform 2nd time extraction of client database without accepting the data which is already loaded in first time extraction?
file1 1 2 3 4 file2 3 4 5 6 output should be in three targets T1 T2 T3 1 3 5 2 4 6 how to do this? can any one help? Thanks
What are the enhancements made in datastage 7.5 compare with 7.0?
In Informatica,for the table I can find coreesponding dependent mappings.Likewise can I find the dependent jobs with all the information by using the table name
Differentiate between operational datastage (ods) and data warehouse?
what is usage of datastage with materialized views