Archives of the TeradataForum

Message Posted: Wed, 01 Jun 2005 @ 17:17:40 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		Re: Why does a badly skewed join run that badly?

From:		Anomy Anom

<-- Anonymously Posted: Wednesday, June 01, 2005 13:03 -->

"...shouldn't a totally skewed merge join of a volatile table with 500.000 rows on 42 amps amount to a perfectly distributed merge join of a volatile table with 22 million rows?"

No. To oversimplify: Let's say you have table A with n rows and table B with m rows. Teradata first compares the RowHash values. If the RowHash values within each table are nearly unique and in sorted order, the number of compares for that is roughly proportional to m+n, and only pairs of rows (one from A and one from B) with matching RowHash require further processing. That provides a huge speed-up effect. If the RowHash values are nearly all the same, just the number of RowHash compares increases to be roughly proportional to m*n; plus every combination of a pair of rows (one from A, one from B) with the same RowHash requires some additional processing to apply the actual join and residual filter criteria. If there are so many duplicates that neither table's rows with a given RowHash all stay in cache, that's even worse because you may have lots of extra I/O not just CPU.


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Attachments

Library

Quick Reference