Archives of the TeradataForum

Message Posted: Tue, 30 Jul 2002 @ 13:58:51 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		Hash vs Merge Join

From:		Claybourne Barrineau

All,

I have a case where a hash join is efficient than a merge join. The problem is, in this case, that the optimizer always wants to perform a merge join. Yes...I have collected all the necessary statistics. My system has 110 amps.

TableA=
(
ColumnA Integer Not Null
ColumnB
ColumnC
ColumnD
...
)
NUPI (ColumnA)

*Note = Logical Primary Key (ColumnA, ColumnB, ColumnC, ColumnD)

Average Record Count Per Amp = ~30,000
Max Record Count Per Amp = ~50,000
Min Record Count Per Amp = ~12,000

TableB= ( ColumnA Integer Not Null ColumnE ColumnF ... ) NUPI (ColumnA)

*Note = Logical Primary Key (ColumnA, ColumnE, ColumnF)

Average Record Count Per Amp = ~7,000
Max Record Count Per Amp = ~9,000
Min Record Count Per Amp = ~6,000

Example of Query =
Select *
From TableA, TableB
Where TableA.ColumnA = TableB.ColumnB

Given the information that the Optimizer has to work with, I cannot fault its decision to perform a merge join; however, as it turns out, the resulting work load when performing a merge join is highly skewed.

Merge Join - ~# of Joins
Average Per Amp = 1,400,000
Max Per Amp = 2,900,000
Min Per Amp = 400,000

Via changes to the DDL of Table A (UPI = Logical Primary Key) and Table B (NUPI = New Column with Value of '1' for every record), I can force a hash join.

Hash Join - ~# of Joins
Average Per Amp = 1,400,000
Max Per Amp = 1,430,000
Min Per Amp = 1,370,000

The hash join query runs faster and the workload on the box is much more evenly distributed (in terms of Spool size and CPU usage).

Does anybody think that it is even possible for the Optimizer to be smart enough to realize that a hash join (in this case) is more reasonable than a merge join?

Thanks,

Clay


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Attachments

Library

Quick Reference