Archives of the TeradataForum

Message Posted: Mon, 10 Jun 2002 @ 20:04:04 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		Re: Derived table

From:		David A. Hough

The rule on distinct versus group by is 'it depends'. If the result set contains only a few unique values, then the vproc local processing done by group by is most efficient. If the result set is mostly distinct (only a few duplicates) then the global sort done by distinct is more efficient.

I don't have a good rule of thumb for the 'distinctness' break even point. My limited testing shows that it seems to depend on row length as well as row count, so there may be some some caching issues involved. My SWAG:

- if the grouped row count is < 10% of the original row count, use group by.

- if the grouped row count is > 95% of the original row count, use distinct.

- if the grouped row count is between 10% & 95% of the original, test.

Note that subqueries use distinct be default, which is why adding group by to a subquery can sometimes improve performance dramatically.


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Archives

2016		2007
2015		2006
2014		2005
2013		2004
2012		2003
2011		2002
2010		2001
2009		2000
2008		1999

2002 Indexes

Jan		Jul
Feb		Aug
Mar		Sep
Apr		Oct
May		Nov
Jun		Dec

Last Modified: 15 Jun 2023