|
Archives of the TeradataForumMessage Posted: Fri, 15 Sep 2006 @ 09:38:02 GMT
This is one of the areas I address often. DISTINCT hashs and sends every value as it is read out of the table. Then the values are sorted on the AMPs and duplicates are then eliminated. GROUP BY does aggregation on each AMP to eliminate duplicates. Then, hashs and sends all the values that are left. After redistribution, each AMP with values does the aggregation again (at most one per AMP) to eliminate the duplicates. Therefore, if the table contains many duplicate values, GROUP BY is more efficient. However, if the table has no or very few duplicates, DISTINCT is actually faster because it does not spend the time looking for duplicates that don't exist and then redistribute all the values. That is time that DISTINCT does not waste because it redistributes the values immediate. So, it depends on your data which one is more efficient. Regards, Michael Larkins
| ||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||
Copyright 2016 - All Rights Reserved | ||||||||||||||||||||||||||||||||||||||||||||||||
Last Modified: 15 Jun 2023 | ||||||||||||||||||||||||||||||||||||||||||||||||