Archives of the TeradataForum
Message Posted: Mon, 10 Jun 2002 @ 20:04:04 GMT
The rule on distinct versus group by is 'it depends'. If the result set contains only a few unique values, then the vproc local processing done by group by is most efficient. If the result set is mostly distinct (only a few duplicates) then the global sort done by distinct is more efficient.
I don't have a good rule of thumb for the 'distinctness' break even point. My limited testing shows that it seems to depend on row length as well as row count, so there may be some some caching issues involved. My SWAG:
- if the grouped row count is < 10% of the original row count, use group by.
- if the grouped row count is > 95% of the original row count, use distinct.
- if the grouped row count is between 10% & 95% of the original, test.
Note that subqueries use distinct be default, which is why adding group by to a subquery can sometimes improve performance dramatically.
|Copyright 2016 - All Rights Reserved|
|Last Modified: 27 Dec 2016|