Home Page for the TeradataForum
 

Archives of the TeradataForum

Message Posted: Mon, 10 Nov 2003 @ 16:07:20 GMT


     
  <Prev Next>   <<First <Prev
Next>
Last>>
 


Subj:   Re: Count of Distinct Rows
 
From:   Claybourne Barrineau

Teradata is looking into this. Thanks for everyone's input!

I'm attaching an example of the query and a good and bad explain.


Thanks,

Clay

select t1.cma_no
from  CMA_TABLES.cma t1
left outer join
      CMA_TABLES.cma_org_select t2
on    t1.cma_no             = t2.cma_no
where t2.cma_no is null
and   t1.cma_status         = 'A';
********************************************
Without stats on T2's UPI:
The optimizer decides to redistribute T2 based on T1s UPI (CMA_No)
elapsed time=~2 seconds
********************************************

40 AMPs

Explanation
--------------------------------------------------
 
  1)First, we lock a distinct CMA_TABLES."pseudo table" for read on a RowHash to prevent global deadlock for CMA_TABLES.t2.  
  2)Next, we lock a distinct CMA_TABLES."pseudo table" for read on a RowHash to prevent global deadlock for CMA_TABLES.t1.  
  3)We lock CMA_TABLES.t2 for read, and we lock CMA_TABLES.t1 for read.  
  4)We do an all-AMPs RETRIEVE step from CMA_TABLES.t2 by way of an all-rows scan with no residual conditions into Spool 2 (all_amps), which is redistributed by hash code to all AMPs. Then we do a SORT to order Spool 2 by row hash. The size of Spool 2 is estimated with low confidence to be 3,617,040 rows. The estimated time for this step is 7.62 seconds.  
  5)We do an all-AMPs JOIN step from CMA_TABLES.t1 by way of a RowHash match scan with a condition of ("CMA_TABLES.t1.CMA_STATUS = 'A'"), which is joined to Spool 2 (Last Use). CMA_TABLES.t1 and Spool 2 are left outer joined using a merge join, with a join condition of ("CMA_TABLES.t1.CMA_NO = Cma_No"). The result goes into Spool 3 (all_amps), which is built locally on the AMPs. The size of Spool 3 is estimated with low confidence to be 2,761,961 rows. The estimated time for this step is 0.77 seconds.  
  6)We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of an all-rows scan with a condition of ("Cma_No IS NULL") into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 2,761,961 rows. The estimated time for this step is 1.44 seconds.  
  7)Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.  
  ->The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 9.83 seconds.  


********************************************
With stats on T2's UPI (which looks like 1 unique value to the optimizer):
The optimizer decides to duplicate T2 to all amps then join to a T1 amp
locally
elapsed time=~1:55 seconds
********************************************
Explanation
--------------------------------------------------
 
  1)First, we lock a distinct CMA_TABLES."pseudo table" for read on a RowHash to prevent global deadlock for CMA_TABLES.t2.  
  2)Next, we lock a distinct CMA_TABLES."pseudo table" for read on a RowHash to prevent global deadlock for CMA_TABLES.t1.  
  3)We lock CMA_TABLES.t2 for read, and we lock CMA_TABLES.t1 for read.  
  4)We execute the following steps in parallel.  
   
  1) We do an all-AMPs RETRIEVE step from CMA_TABLES.t1 by way of an all-rows scan with a condition of ("CMA_TABLES.t1.CMA_STATUS = 'A'") into Spool 2 (all_amps), which is built locally on the AMPs. Then we do a SORT to order Spool 2 by row hash. The size of Spool 2 is estimated with high confidence to be 253,925 rows. The estimated time for this step is 0.76 seconds.
 
   
  2) We do an all-AMPs RETRIEVE step from CMA_TABLES.t2 by way of an all-rows scan with no residual conditions into Spool 3 (all_amps), which is duplicated on all AMPs. Then we do a SORT to order Spool 3 by row hash. The result spool file will not be cached in memory. The size of Spool 3 is estimated with high confidence to be 144,205,840 rows. The estimated time for this step is 6 minutes and 20 seconds.
 
  5)We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of a RowHash match scan, which is joined to Spool 3 (Last Use). Spool 2 and Spool 3 are left outer joined using a merge join, with a join condition of ("CMA_NO = Cma_No"). The result goes into Spool 4 (all_amps), which is built locally on the AMPs. The size of Spool 4 is estimated with low confidence to be 2,752,879 rows. The estimated time for this step is 12.36 seconds.  
  6)We do an all-AMPs RETRIEVE step from Spool 4 (Last Use) by way of an all-rows scan with a condition of ("Cma_No IS NULL") into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 2,752,879 rows. The estimated time for this step is 1.43 seconds.  
  7)Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.  
  ->The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 6 minutes and 34 seconds.  



     
  <Prev Next>   <<First <Prev
Next>
Last>>
 
 
 
 
 
 
 
 
 
  
  Top Home Privacy Feedback  
 
 
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 15 Jun 2023