Home Page for the TeradataForum
 

Archives of the TeradataForum

Message Posted: Sat, 12 Jun 2004 @ 20:23:27 GMT


     
  <Prev Next>   <<First <Prev Next> Last>>  


Subj:   Re: Performance Improvement on this Query without Collecting Statistics
 
From:   Dieter Noeth

Naveen Ram Prasanna Kambhoji wrote:

  I have Pasted the Explain plan for your reference ....  


  Explanation  


  5) We execute the following steps in parallel.  


  1) We do an all-AMPs RETRIEVE step from AI_OLAP.dt by way of an all-rows scan with no residual conditions into Spool 4 (all_amps), which is duplicated on all AMPs. Then we do a SORT to order Spool 4 by row hash. The size of Spool 4 is estimated with high confidence to be 17,020 rows. The estimated time for this step is 0.05 seconds.  


Do you run that query on a 20-AMP system? This results in 851 rows in Date table according to statistics, but in your first post you claimed "882 rows in DATE table". So you added one more month without updating statistics ;-)

This is no problem in this query, but might lead to incorrect spool sizes if you access using a "where date_col between ... and ...". As the table is small you should always have uptodate stats on it.


  2) We do an all-AMPs RETRIEVE step from AI_OLAP.i by way of an all-rows scan with no residual conditions into Spool 5 (all_amps), which is built locally on the AMPs. Then we do a SORT to order Spool 5 by row hash. The result spool file will not be cached in memory. The size of Spool 5 is estimated with low confidence to be 128,800,340 rows. The estimated time for this step is 14 minutes and 2 seconds.  


There's no statistics defined, so the optimzer uses a Random AMP Sample to estimate the number of rows.

"around 180 Million rows in DAILY_SPEND" vs. "low confidence to be 128,800,340 rows" would indicate the table is totally skewed.


  6) We do an all-AMPs JOIN step from Spool 4 (Last Use) by way of a RowHash match scan, which is joined to Spool 5 (Last Use). Spool 4 and Spool 5 are joined using a merge join, with a join condition of ("DateID = DateID"). The result goes into Spool 3  



  ...  



  7) We do an all-AMPs SUM step to aggregate from Spool 3 (Last  



  ...  



  8) We do an all-AMPs RETRIEVE step from Spool 6 (Last Use) by way of an all-rows scan into Spool 1 (all_amps), which is redistributed by hash code to all AMPs. Then we do a SORT to order Spool 1 by row hash. The result spool file will not be cached in memory.  



  9) We do an all-AMPs MERGE into AI_OLAP.FA_BCWK_IMP from Spool > 1 (Last Use).  


Try to get rid of that "Spool both tables and merge join" using Thomas suggestion:

     FROM DAILY_SPEND s JOIN
        (select BCWeekID, min(DateID) as minDate, max(DateID) as maxDate
         from "DATE" group by 1) dt
     ON s.DateID between dt.minDate and dt.maxDate

This will force the optimzer to use a product join, so it'll probably not spool DAILY_SPEND.

If this query merges 7 daily rows into 1 weekly row a pre-aggregation will probably not help that much, but you can still try...

But you'll probably get the best performance when you can eliminate the join, as Victor suggested. What's so special about BCWeekID that you can't calculate it on the fly ;-)


  BTW what is the use of 8th Step???  


"SORT to order Spool 1 by row hash" -> After that the spool rows are sorted in the right order to be merged into the target table in step 9.


Dieter



     
  <Prev Next>   <<First <Prev Next> Last>>  
 
 
 
 
 
 
 
 
  
  Top Home Privacy Feedback  
 
 
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 27 Dec 2016