Archives of the TeradataForum

Message Posted: Fri, 19 Apr 2002 @ 18:00:57 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		Large Table \ Small Table Join Question

From:		Claybourne L. Barrineau

I was hoping somebody could shed some light on a problem I see with large table \ small table joins. Below, I've described 3 scenarios of joining a large table with a small table.

Scenario 1
Table A (large...~300 million)
ColA
ColB
ColC
ColD
Day_Cd
Curr_Cd
Meas1
Meas2
Meas...
-Primary Index (ColA, ColB)
-Without NUSI on Curr_Cd, Day_Cd

Table B (small...< 10,000 records)
Day_Cd
Curr_Cd
Xchg_Rate
-Unique Primary Index (Day_Cd, Curr_Cd)

When I joined Table A to Table B on (Day_Cd/Curr_Cd) the optimizer would redistribute the large table (Table A) to all AMPs based upon the primary index of Table B (poor performance)

Scenario 2
Table A (large...~300 million)
ColA
ColB
ColC
ColD
Day_Cd
Curr_Cd
Meas1
Meas2
Meas...
-Primary Index (ColA, ColB)
-Without NUSI on Curr_Cd, Day_Cd

Table B (small...< 10,000 records)
Dmy_Prim_Indx default value 1
Day_Cd
Curr_Cd
Xchg_Rate

-Unique Primary Index (Dmy_Prmy_Indx) - I'm populating this Primary Index value with '1' for every row in the table

When I joined Table A to Table B on (Day_Cd/Curr_Cd), the optimizer would duplicate Table B to all AMPs then perform a single partition hash join based upon the join condition (great performance, but hokey)

Scenario 3
Table A (large...~300 million)
ColA
ColB
ColC
ColD
Day_Cd
Curr_Cd
Meas1
Meas2
Meas...
-Primary Index (ColA, ColB)
-With NUSI on Curr_Cd, Day_Cd

Table B (small...< 10,000 records)
Day_Cd
Curr_Cd
Xchg_Rate
-Unique Primary Index (Day_Cd, Curr_Cd)

In summary
When the optimizer had better statistics about Day_Cd/Curr_Cd (via the creation of a NUSI), it avoided making the poor decision of redistributing the large table to all amps based upon the primary index (or join columns) of the smaller table.

However, I had to create a NUSI on these join columns so that the optimizer didn't make a bad decisions. As you know, secondary indexes on large table take lots of diskpspace, slow inserts, updates, deletes, and can sometimes degredate run time performance. The better solution would be to collect statistics on these join columns without a secondary index (currently not possible because the we can only collect stats on a single field if we don't build a secondary index.)

Other than the 'fake primary index of value 1'- (example 2) or a single table primary index, can you think of another way to avoid redistributing a large table when joining (on multiple columns) to a small table without adding a secondary index to the large table based upon the join condition to the smaller table?

Thanks,

Claybourne Barrineau


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Attachments

Library

Quick Reference