Archives of the TeradataForum
Message Posted: Thu, 27 Mar 2008 @ 21:55:47 GMT
Teradata does DISTINCT and GROUP BY in different ways.
For DISTINCT, the optimizer redistributes based on the hash of the DISTINCT columns, then does a sort to eliminate duplicates. When the number of DISTINCT values is large, this performs very well, but when the number of values is small, performance is terrible because many amps get no rows at all. A small number of amps does all the work.
For GROUP BY, the optimizer uses a SUM step. Inside the SUM step, there is almost always a redistribution based on the hash of the GROUP BY columns. SUM steps perform well when the number of GROUP BY values is small, and poorly when the number of values is large.
So... When you know the number of values is small, use GROUP BY. When you know the number of values is large, use DISTINCT.
There is a proposal to make the optimizer smart enough to use the method that will perform the best. Some how, it keep slipping out of releases. Last I heard, it was scheduled for 13.0.0.
|Copyright 2016 - All Rights Reserved|
|Last Modified: 23 Jun 2019|