|
|
Archives of the TeradataForum
Message Posted: Wed, 01 Jun 2005 @ 17:17:40 GMT
Subj: | | Re: Why does a badly skewed join run that badly? |
|
From: | | Anomy Anom |
<-- Anonymously Posted: Wednesday, June 01, 2005 13:03 -->
| "...shouldn't a totally skewed merge join of a volatile table with 500.000 rows on 42 amps amount to a perfectly distributed merge join
of a volatile table with 22 million rows?" | |
No. To oversimplify: Let's say you have table A with n rows and table B with m rows. Teradata first compares the RowHash values. If the RowHash
values within each table are nearly unique and in sorted order, the number of compares for that is roughly proportional to m+n, and only pairs of
rows (one from A and one from B) with matching RowHash require further processing. That provides a huge speed-up effect. If the RowHash values are
nearly all the same, just the number of RowHash compares increases to be roughly proportional to m*n; plus every combination of a pair of rows
(one from A, one from B) with the same RowHash requires some additional processing to apply the actual join and residual filter criteria. If there
are so many duplicates that neither table's rows with a given RowHash all stay in cache, that's even worse because you may have lots of extra I/O
not just CPU.
| |