Archives of the TeradataForum

Message Posted: Thu, 27 Mar 2008 @ 21:55:47 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		Re: How GROUP BY improves the performance than

From:		Christie, Jon

Teradata does DISTINCT and GROUP BY in different ways.

For DISTINCT, the optimizer redistributes based on the hash of the DISTINCT columns, then does a sort to eliminate duplicates. When the number of DISTINCT values is large, this performs very well, but when the number of values is small, performance is terrible because many amps get no rows at all. A small number of amps does all the work.

For GROUP BY, the optimizer uses a SUM step. Inside the SUM step, there is almost always a redistribution based on the hash of the GROUP BY columns. SUM steps perform well when the number of GROUP BY values is small, and poorly when the number of values is large.

So... When you know the number of values is small, use GROUP BY. When you know the number of values is large, use DISTINCT.

There is a proposal to make the optimizer smart enough to use the method that will perform the best. Some how, it keep slipping out of releases. Last I heard, it was scheduled for 13.0.0.


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Archives

2016		2007
2015		2006
2014		2005
2013		2004
2012		2003
2011		2002
2010		2001
2009		2000
2008		1999

2008 Indexes

Jan		Jul
Feb		Aug
Mar		Sep
Apr		Oct
May		Nov
Jun		Dec

Last Modified: 15 Jun 2023