Home Page for the TeradataForum
 

Archives of the TeradataForum

Message Posted: Fri, 29 Jun 2007 @ 23:09:49 GMT


     
  <Prev Next>  
<<First
<Prev
Next> Last>>  


Subj:   How to get confidence on derived spool
 
From:   Kopuuravuri, Ramu

Hi,

I have a major volume of data in history (DBviews.stmnt_sub) tables with partition on date range.

I have a scenario to take 3 partitions data into 2 different spools. And do the left outer join between these two spools.

I'm getting no confidence on Join spool. Pls provide best practice solution to active this prob.

Query :

     sel a.id, a.cyc_dt, b.bill_yy_mm, date_rank, b.curr_ind
     from DBviews.stmnt_sub a
     left join
     (
               sel id, bill_yy_mm
                 , curr_ind
                 , RANK() OVER (partition by id ORDER BY bill_yy_mm desc) as date_rank
               from
               (
                 sel id, substr(cyc_dt,1,7) as bill_yy_mm, max(curr_ind) as curr_ind
                 from DBviews.stmnt_sub
                 where confirm_ind='Y'
                 and cyc_dt between cast('2007-03-01' as DATE FORMAT 'YYYY-MM-DD')
                                and cast('2007-05-31' as DATE FORMAT 'YYYY-MM-DD')
                 and id > 0
                 group by 1,2
               ) a
               QUALIFY SUM(1) Over(
                  PARTITION BY id, bill_yy_mm
                  ORDER BY bill_yy_mm
                  ROWS UNBOUNDED PRECEDING
                  ) = 1
     ) b
               on a.id=b.id
               and a.cyc_dt between cast('2007-03-01' as DATE FORMAT 'YYYY-MM-DD')
                     and cast('2007-05-31' as DATE FORMAT 'YYYY-MM-DD')
               and a.id > 0
     where
               a.cyc_dt between ADD_MONTHS(a.cyc_dt, -4) and current_date
               and a.id > 0
               and substr(a.cyc_dt,1,7)=b.bill_yy_mm

Explanation

Explanation
--------------------------------------------------
 
  1)First, we lock DB.stmnt_sub for access.  
  2)Next, we do an all-AMPs SUM step to aggregate from 3 partitions of DB.stmnt_sub with a condition of ( "(DB.stmnt_sub.cyc_dt >= DATE '2007- 03-01') AND ((DB.stmnt_sub.cyc_dt <= DATE '2007-05-31') AND ((DB.stmnt_sub.confirm_ind = 'Y') AND (DB.stmnt_sub.id > 0 )))"), and the grouping identifier in field 1. Aggregate Intermediate Results are computed locally, then placed in Spool 4. The input table will not be cached in memory, but it is eligible for synchronized scanning. The size of Spool 4 is estimated with low confidence to be 135,646,877 rows. The estimated time for this step is 5.74 seconds.  
  3)We do an all-AMPs RETRIEVE step from Spool 4 (Last Use) by way of an all-rows scan into Spool 1 (all_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 135,646,877 rows. The estimated time for this step is 1.72 seconds.  
  4)We do an all-AMPs STAT FUNCTION step from Spool 1 (Last Use) by way of an all-rows scan into Spool 9 (Last Use), which is assumed to be redistributed by value to all AMPs. The result rows are put into Spool 7 (all_amps), which is built locally on the AMPs.  
  5)We do an all-AMPs STAT FUNCTION step from Spool 7 (Last Use) by way of an all-rows scan into Spool 12 (Last Use), which is assumed to be redistributed by value to all AMPs. The result rows are put into Spool 11 (all_amps), which is built locally on the AMPs.  
  6)We execute the following steps in parallel.  
   
  1) We do an all-AMPs RETRIEVE step from Spool 11 (Last Use) by way of an all-rows scan with a condition of ("Field_6 = 1") into Spool 2 (all_amps), which is built locally on the AMPs. The size of Spool 2 is estimated with no confidence to be 135,646,877 rows. The estimated time for this step is 1.82 seconds.
 
   
  2) We do an all-AMPs RETRIEVE step from 3 partitions of DB.stmnt_sub with a condition of ( "(DB.stmnt_sub.cyc_dt >= (ADD_MONTHS((DB.stmnt_sub.cyc_dt ),-4 ))) AND ((DB.stmnt_sub.cyc_dt <= DATE) AND ((DB.stmnt_sub.id > 0) AND ((DB.stmnt_sub.cyc_dt >= DATE '2007- 03-01') AND ((DB.stmnt_sub.cyc_dt <= DATE '2007-05-31') AND (DB.stmnt_sub.id > 0 )))))") locking for access into Spool 16 (all_amps), which is built locally on the AMPs. The size of Spool 16 is estimated with no confidence to be 105,109,185 rows. The estimated time for this step is 2.90 seconds.
 
  7)We do an all-AMPs RETRIEVE step from Spool 2 (Last Use) by way of an all-rows scan into Spool 17 (all_amps), which is redistributed by hash code to all AMPs. Then we do a SORT to order Spool 17 by row hash. The size of Spool 17 is estimated with no confidence to be 135,646,877 rows. The estimated time for this step is 6.16 seconds.  
  8)We do an all-AMPs JOIN step from Spool 16 (Last Use) by way of a RowHash match scan, which is joined to Spool 17 (Last Use) by way of a RowHash match scan. Spool 16 and Spool 17 are joined using a merge join, with a join condition of ("(id = id) AND ((SUBSTR(cyc_dt (VARCHAR(10), CHARACTER SET LATIN, NOT CASESPECIFIC, FORMAT 'YYYY-MM-DD'),1 ,7 ))= BILL_YY_MM)"). The result goes into Spool 15 (group_amps), which is built locally on the AMPs. The size of Spool 15 is estimated with no confidence to be 135,646,877 rows. The estimated time for this step is 2.54 seconds.  
  9)Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.  
  ->The contents of Spool 15 are sent back to the user as the result of statement 1.  


Step 6 have no confidence, how to improve confidence level to at least low.

Appreciate for your suggestions.


Regards,

Ramu.



     
  <Prev Next>  
<<First
<Prev
Next> Last>>  
 
 
 
 
 
 
 
 
  
  Top Home Privacy Feedback  
 
 
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 15 Jun 2023