|
Archives of the TeradataForumMessage Posted: Fri, 19 Apr 2002 @ 18:00:57 GMT
I was hoping somebody could shed some light on a problem I see with large table \ small table joins. Below, I've described 3 scenarios of joining a large table with a small table. Scenario 1 Table A (large...~300 million) ColA ColB ColC ColD Day_Cd Curr_Cd Meas1 Meas2 Meas... -Primary Index (ColA, ColB) -Without NUSI on Curr_Cd, Day_Cd Table B (small...< 10,000 records) Day_Cd Curr_Cd Xchg_Rate -Unique Primary Index (Day_Cd, Curr_Cd) When I joined Table A to Table B on (Day_Cd/Curr_Cd) the optimizer would redistribute the large table (Table A) to all AMPs based upon the primary index of Table B (poor performance) Scenario 2 Table A (large...~300 million) ColA ColB ColC ColD Day_Cd Curr_Cd Meas1 Meas2 Meas... -Primary Index (ColA, ColB) -Without NUSI on Curr_Cd, Day_Cd Table B (small...< 10,000 records) Dmy_Prim_Indx default value 1 Day_Cd Curr_Cd Xchg_Rate -Unique Primary Index (Dmy_Prmy_Indx) - I'm populating this Primary Index value with '1' for every row in the table When I joined Table A to Table B on (Day_Cd/Curr_Cd), the optimizer would duplicate Table B to all AMPs then perform a single partition hash join based upon the join condition (great performance, but hokey) Scenario 3 Table A (large...~300 million) ColA ColB ColC ColD Day_Cd Curr_Cd Meas1 Meas2 Meas... -Primary Index (ColA, ColB) -With NUSI on Curr_Cd, Day_Cd Table B (small...< 10,000 records) Day_Cd Curr_Cd Xchg_Rate -Unique Primary Index (Day_Cd, Curr_Cd) When I joined Table A to Table B on (Day_Cd/Curr_Cd), the optimizer would duplicate Table B to all AMPs then perform a single partition hash join based upon the join condition (great performance, not hokey) In summary However, I had to create a NUSI on these join columns so that the optimizer didn't make a bad decisions. As you know, secondary indexes on large table take lots of diskpspace, slow inserts, updates, deletes, and can sometimes degredate run time performance. The better solution would be to collect statistics on these join columns without a secondary index (currently not possible because the we can only collect stats on a single field if we don't build a secondary index.) Other than the 'fake primary index of value 1'- (example 2) or a single table primary index, can you think of another way to avoid redistributing a large table when joining (on multiple columns) to a small table without adding a secondary index to the large table based upon the join condition to the smaller table? Thanks, Claybourne Barrineau
| ||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||
Copyright 2016 - All Rights Reserved | ||||||||||||||||||||||||||||||||||||||||||||||||
Last Modified: 15 Jun 2023 | ||||||||||||||||||||||||||||||||||||||||||||||||