Home Page for the TeradataForum

Archives of the TeradataForum

Message Posted: Mon, 10 Jun 2002 @ 20:04:04 GMT

  <Prev Next>   <<First <Prev

Subj:   Re: Derived table
From:   David A. Hough

The rule on distinct versus group by is 'it depends'. If the result set contains only a few unique values, then the vproc local processing done by group by is most efficient. If the result set is mostly distinct (only a few duplicates) then the global sort done by distinct is more efficient.

I don't have a good rule of thumb for the 'distinctness' break even point. My limited testing shows that it seems to depend on row length as well as row count, so there may be some some caching issues involved. My SWAG:

- if the grouped row count is < 10% of the original row count, use group by.

- if the grouped row count is > 95% of the original row count, use distinct.

- if the grouped row count is between 10% & 95% of the original, test.

Note that subqueries use distinct be default, which is why adding group by to a subquery can sometimes improve performance dramatically.

  <Prev Next>   <<First <Prev
  Top Home Privacy Feedback  
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 15 Jun 2023