Home Page for the TeradataForum
 

Archives of the TeradataForum

Message Posted: Tue, 14 Jul 2015 @ 19:14:34 GMT


     
  <Prev Next>  
<<First
<Prev
Next> Last>>  


Subj:   Update query running slow
 
From:   Bakthavachalam, Roopalini

Forum - The below query is running for 90 minutes on a 22 nodes 540 AMP Teradata system. I am trying to bring down the execution time with no luck.

Is there anything that I can try to reduce the execution time?

Here is the query with explain

     UPDATE A
     FROM  TEST.TABLE1 AS A
        , (SELECT DISTINCT COL1,COL2,START_DATE
             , END_DATE,PLCFLAG
           FROM TEST.TABLE2
          ) AS B
      SET PLACE_FLAG_CD_old = B.PLCFLAG
     WHERE A.COL1 = B.COL1
        AND A.COL2 = B.COL2
           AND A.START_DATE = B.START_DATE
        AND A.END_DATE = B.END_DATE;

TABLE1 and TABLE2 have 5.5 Billion records.

COL1 is the NPI on both table1 and tabl2.

START_DATE is PPI on both table1 and tabl2.

Stats are collected on all the join columns of both the tables. I still don't understand why I see low confidence though.

Also grouping by COL1,COL2,START_DATE,END_DATE gives duplicate entries and table 1 has PPI defined on another column not used in this query. So I am unable to use MERGE UPDATE.

The EXPLAIN is using sliding window merge join despite PPI created on the START_DATE on both the tables. Is there any way , I can avoid the sliding window merge join?

     Explain UPDATE A
     FROM  TEST.TABLE1 AS A
        , (SELECT DISTINCT /*PLAN_CD,*/COL1,COL2,START_DATE
             , END_DATE,PLCFLAG
           FROM TEST.TABLE2
          ) AS B
      SET PLACE_FLAG_CD_old = B.PLCFLAG
     WHERE A.COL1 = B.COL1
        AND A.COL2 = B.COL2
        /*AND A.PLAN_CD = B.PLAN_CD*/
        AND A.START_DATE = B.START_DATE
        AND A.END_DATE = B.END_DATE;
Explanation
--------------------------------------------------
 
  1)First, we lock a distinct TEST."pseudo table" for read on a RowHash to prevent global deadlock for TEST.TABLE2.  
  2)Next, we lock a distinct TEST."pseudo table" for write on a RowHash to prevent global deadlock for TEST.TABLE1.  
  3)We lock TEST.TABLE2 for read, and we lock TEST.TABLE1 for write.  
  4)We do an all-AMPs SUM step to aggregate from TEST.TABLE2 by way of an all-rows scan with a condition of ("(NOT (TEST.TABLE2.START_DATE IS NULL )) AND ((NOT (TEST.TABLE2.END_DATE IS NULL )) AND ((NOT (TEST.TABLE2.COL2 IS NULL )) AND (NOT (TEST.TABLE2.COL1 IS NULL ))))"), and the grouping identifier in field 1. Aggregate Intermediate Results are computed locally, then placed in Spool 3. The size of Spool 3 is estimated with low confidence to be 2,560,714,950 rows (245,828,635,200 bytes). The estimated time for this step is 20 minutes and 18 seconds.  
  5)We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of an all-rows scan into Spool 1 (used to materialize view, derived table or table function B) (all_amps) (compressed columns allowed), which is built locally on the AMPs with Field1 ("UniqueId"). The size of Spool 1 is estimated with low confidence to be 2,560,714,950 rows (128,035,747,500 bytes). Spool AsgnList: "UniqueId", "COL1", "COL2", "START_DATE", "END_DATE", "plcFlag". The estimated time for this step is 1 minute and 17 seconds.  
  6)We do an all-AMPs RETRIEVE step from Spool 1 (Last Use) by way of an all-rows scan into Spool 6 (all_amps) (compressed columns allowed), which is redistributed by hash code to all AMPs with hash fields ("Spool_1.COL1"). Then we do a SORT to order Spool 6 by row hash. The size of Spool 6 is estimated with low confidence to be 2,560,714,950 rows (107,550,027,900 bytes). Spool AsgnList: "COL1" = "Spool_1.COL1", "COL2" = "COL2", "START_DATE" = "START_DATE", "END_DATE" = "END_DATE", "PLCFLAG" = "PLCFLAG". The estimated time for this step is 4 minutes and 9 seconds.  
  7)We do an all-AMPs JOIN step (No Sum) from TEST.TABLE1 by way of a RowHash match scan, which is joined to Spool 6 (Last Use) by way of a RowHash match scan. TEST.TABLE1 and Spool 6 are joined using a sliding-window merge join, with a join condition of ("(TEST.TABLE1.COL1 = Spool_6.COL1) AND ((TEST.TABLE1.COL2 = Spool_6.COL2) AND ((TEST.TABLE1.START_DATE = Spool_6.START_DATE) AND (TEST.TABLE1.END_DATE = Spool_6.END_DATE )))"). The input table TEST.TABLE1 will not be cached in memory, but it is eligible for synchronized scanning. The result goes into Spool 5 (all_amps), which is built locally on the AMPs with hash fields ("TEST.TABLE1.ROWID") and Field1 ("TEST.TABLE1.ROWID"). Then we do a SORT to order Spool 5 by the sort key in spool field1. The size of Spool 5 is estimated with low confidence to be 2,560,714,950 rows (53,775,013,950 bytes). Spool AsgnList: "Field_1" = "TEST.TABLE1.ROWID", "PLCFLAG" = "{RightTable}.PLCFLAG". The estimated time for this step is 26 minutes and 58 seconds.  
  8)We do a MERGE Update to TEST.TABLE1 from Spool 5 (Last Use) via ROWID. The size is estimated with low confidence to be 2,560,714,950 rows (2,750,207,856,300 bytes). The estimated time for this step is 92 hours and 4 minutes.  
  9)Finally, we send out an END TRANSACTION step to all AMPs involved n processing the request.
No rows are returned to the user as the result of statement 1.
The total estimated time is 92 hours and 57 minutes.
BEGIN RECOMMENDED STATS ->
 
  10)"COLLECT STATISTICS COLUMN (COL1 ,COL2,START_DATE , END_DATE) ON TEST.TABLE2". (LowConf)  
  11)"COLLECT STATISTICS COLUMN (COL1 ,COL2 ,END_DATE, START_DATE) ON TEST.TABLE1". (LowConf) <- END RECOMMENDED STATS  


Thanks

Roopalini



     
  <Prev Next>  
<<First
<Prev
Next> Last>>  
 
 
 
 
 
 
 
 
  
  Top Home Privacy Feedback  
 
 
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 15 Jun 2023