Home Page for the TeradataForum
 

Archives of the TeradataForum

Message Posted: Tue, 21 Jun 2011 @ 08:05:25 GMT


     
  <Prev Next>   <<First <Prev Next> Last>>  


Subj:   Re: Distinct vs. group by in TD 13 onwards
 
From:   Anomy Anom

<-- Anonymously Posted: Tuesday, June 21, 2011 12:13 -->

Maybe this may help a little

Query Rewrite

A long standing feature of Teradata aggregation processing has been the fact that we do DISTINCT processing in a different manner than we do GROUP BY processing. This is apparent when a user runs a request such as:


SELECT DISTINCT c1 FROM tbl1;

In this scenario, the column c1 has about five distinct values and more than half of the rows in the table have just one value. The table itself has several million rows. The plan generated to service this request redistributes the table into a spool file, hashing on the column c1 and then does a sort to eliminate duplicates. Since more than half the rows distribute to a single amp, the request can be processing intensive and risks exceeding the user's spool limit. By comparison, the query can be written as follows (which returns identical results):


SELECT c1 FROM tbl1 GROUP BY 1;

In this case, the plan to service this request uses a sum step even though no aggregate is requested in the query. Because of this, an answer is returned more quickly and the user's spool limit is not exceeded.


Creating a single plan for each of these SQL approaches which optimizes the aggregation processing is the intention of this feature enhancement. The new feature encompasses the following considerations and benefits:

- Rewriting the "DISTINCT query" as a "Group-By" query requires only a single aggregation operator.

- Pre-TD13 optimizer does better planning for Group-By than for DISTINCT.

- Runtime benefits can literally save minutes from a query depending on the size of the table(s) and the rows per value.

- Applicable for single table as well as multiple table queries.

- Select list columns are deployed as groupings column(s) of the query during rewrite.

- Rewrite is skipped when: Aggregation/OLAP is in SELECTed list of query columns. WITH, HAVING, SAMPLE, RANDOM, ORDER BY clauses are present in the query

- Reversal Feature: Group-By queries can also be rewritten as Distinct queries for subqueries. This will reduce number of joins and allow the optimizer to do better join planning.



     
  <Prev Next>   <<First <Prev Next> Last>>  
 
 
 
 
 
 
 
 
  
  Top Home Privacy Feedback  
 
 
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 15 Jun 2023