Why do Hash joins usually perform better than Merge Joins?
Answer / narayana
In MERGE join rows to be join must be present in same AMP.. If the rows to be joined are not on the same AMP, Teradata will either redistribute the data or duplicate the data in spool to make that happen based on row hash of the columns involved in the joins WHERE Clause.Hash join takes place if one or both of the tables on each can fit completely inside the AMP's memory.AMP chooses to hold small tables in its memory for joins happening on ROW hash.
Usually optimizer will first identify a smaller table, and then sort it by the join column row hash sequence. If the smaller table is really small and can fit in the memory, the performance will be best. Otherwise, the sorted smaller table will be duplicated to all the AMPs. Then the larger table is processed one row at a time by doing a binary search against the smaller table for matched record.
Where as in MERGE join Columns to be join is Non INDEXED column. teradata will redistribute the table rows into SPOOL memory and sort them by hash code.So that matching data lies on same amp, so the join can happen on redistributed data
| Is This Answer Correct ? | 6 Yes | 0 No |
In Teradata, what is the significance of UPSERT command?
Diff b/w v2r5 and v2r6 ?
What is the use of upsert command in teradata?
How to Skip or Get first and Last Record from Flat File through MultiLoad and TPUMP Utility?
How many codd's rules are satisfied by teradata database?
What are the enhanced features in teradata v2r5 and v2r6?
Can you fastexport a field, which is primary key by putting equality on that key?
How many types of joins are there in teradata?
Can we have two time dimensions in a schema(either star or snow flake)? For ex if we want joining date of employee and if we want today's sales with time whether can we have two time dimensions for accommodating above tasks?
How many types of index are present in teradata?
How do you do backup and recovery in teradata?
Hello all, There is a table with 4 columns in that 3 columns has been already loaded with 5 million records.4th column is empty,Now I have got 5 million records data which has to be loaded into 4th column.How can I load this data fastly in to the 4th column with out using update