Home Page for the TeradataForum
 

Archives of the TeradataForum

Message Posted: Thu, 27 Mar 2008 @ 21:55:47 GMT


     
  <Prev Next>   <<First <Prev
Next>
Last>>
 


Subj:   Re: How GROUP BY improves the performance than
 
From:   Christie, Jon

Teradata does DISTINCT and GROUP BY in different ways.

For DISTINCT, the optimizer redistributes based on the hash of the DISTINCT columns, then does a sort to eliminate duplicates. When the number of DISTINCT values is large, this performs very well, but when the number of values is small, performance is terrible because many amps get no rows at all. A small number of amps does all the work.

For GROUP BY, the optimizer uses a SUM step. Inside the SUM step, there is almost always a redistribution based on the hash of the GROUP BY columns. SUM steps perform well when the number of GROUP BY values is small, and poorly when the number of values is large.

So... When you know the number of values is small, use GROUP BY. When you know the number of values is large, use DISTINCT.

There is a proposal to make the optimizer smart enough to use the method that will perform the best. Some how, it keep slipping out of releases. Last I heard, it was scheduled for 13.0.0.



     
  <Prev Next>   <<First <Prev
Next>
Last>>
 
 
 
 
 
 
 
 
 
  
  Top Home Privacy Feedback  
 
 
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 15 Jun 2023