|
|
Archives of the TeradataForum
Message Posted: Thu, 27 Mar 2008 @ 21:55:47 GMT
Subj: | | Re: How GROUP BY improves the performance than |
|
From: | | Christie, Jon |
Teradata does DISTINCT and GROUP BY in different ways.
For DISTINCT, the optimizer redistributes based on the hash of the DISTINCT columns, then does a sort to eliminate duplicates. When the number
of DISTINCT values is large, this performs very well, but when the number of values is small, performance is terrible because many amps get no
rows at all. A small number of amps does all the work.
For GROUP BY, the optimizer uses a SUM step. Inside the SUM step, there is almost always a redistribution based on the hash of the GROUP BY
columns. SUM steps perform well when the number of GROUP BY values is small, and poorly when the number of values is large.
So... When you know the number of values is small, use GROUP BY. When you know the number of values is large, use DISTINCT.
There is a proposal to make the optimizer smart enough to use the method that will perform the best. Some how, it keep slipping out of
releases. Last I heard, it was scheduled for 13.0.0.
| |