Home Page for the TeradataForum
 

Archives of the TeradataForum

Message Posted: Fri, 15 Sep 2006 @ 09:38:02 GMT


     
  <Prev Next>   <<First <Prev
Next>
Last>>
 


Subj:   Re: Which method is faster?
 
From:   Michael Larkins

This is one of the areas I address often. DISTINCT hashs and sends every value as it is read out of the table. Then the values are sorted on the AMPs and duplicates are then eliminated.

GROUP BY does aggregation on each AMP to eliminate duplicates. Then, hashs and sends all the values that are left. After redistribution, each AMP with values does the aggregation again (at most one per AMP) to eliminate the duplicates.

Therefore, if the table contains many duplicate values, GROUP BY is more efficient. However, if the table has no or very few duplicates, DISTINCT is actually faster because it does not spend the time looking for duplicates that don't exist and then redistribute all the values. That is time that DISTINCT does not waste because it redistributes the values immediate.

So, it depends on your data which one is more efficient.


Regards,

Michael Larkins
Certified Teradata Master
Certified Teradata SQL Instructor



     
  <Prev Next>   <<First <Prev
Next>
Last>>
 
 
 
 
 
 
 
 
 
  
  Top Home Privacy Feedback  
 
 
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 15 Jun 2023