Home Page for the TeradataForum
 

Archives of the TeradataForum

Message Posted: Tue, 10 Sep 2013 @ 15:33:59 GMT


     
  <Prev Next>  
<<First
<Prev
Next> Last>>  


Subj:   Merge delete performance tuning
 
From:   Bakthavachalam, Roopalini

Hi Forum,

I am trying to tune the below DELETE query for our development team. I did couple things and could bring down the estimated time from 17 hours to 3 hours. However I am unable to bring it down any further. Could any of you please tell me, if there is anything more I can do ? The stats are updated for F_PACE table. Though the F_PACE table is partitioned , it is not using any of the partitions because of the condition T1.AS_OF_CALENDAR_ID > CAST(CAST(T2.LAST_PACE_DT AS DATE FORMAT 'YYYYMMDD' ) AS CHAR( 8 )). I dropped the secondary index and that brought it down to 3 hours.

F_PACE table has 682 M records and E_OPERA_BATCH_CONTROL has 72 records.

     DELETE T1
     FROM C302_PRD_FS_EDW.F_PACE T1,
     C302_PRD_FS_STAGE.V_E_OPERA_BATCH_CONTROL T2
      WHERE T1.HOTEL_ID= T2.HOTEL_ID
     AND T1.AS_OF_CALENDAR_ID > CAST(CAST(T2.LAST_PACE_DT  AS DATE FORMAT 'YYYYMMDD' ) AS CHAR( 8 ))
      AND T2.OPERA_BATCH_CD='JAC'

     C302_PRD_FS_EDW.F_PACE  DDL

     SHOW TABLE

     C302_PRD_FS_EDW.F_PACE

     CREATE MULTISET TABLE C302_PRD_FS_EDW.F_PACE ,NO FALLBACK ,
          NO BEFORE JOURNAL,
          NO AFTER JOURNAL,
          CHECKSUM = DEFAULT,
          DEFAULT MERGEBLOCKRATIO
          (
           hotel_id INTEGER TITLE 'hotel_id' NOT NULL,
           as_of_calendar_id INTEGER TITLE 'as_of_calendar_id' NOT NULL,
           stay_calendar_id INTEGER TITLE 'stay_calendar_id' NOT NULL,
           resv_calendar_id INTEGER TITLE 'resv_calendar_id' NOT NULL,
           stay_market_id INTEGER TITLE 'stay_market_id' NOT NULL,
           rate_code_id INTEGER TITLE 'rate_code_id' NOT NULL,
           channel_id INTEGER TITLE 'channel_id' NOT NULL,
           company_id INTEGER TITLE 'company_id' NOT NULL,
           travel_agent_id INTEGER TITLE 'travel_agent_id' NOT NULL,
           room_type_id INTEGER TITLE 'room_type_id' NOT NULL,
           arrival_ind CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC TITLE 'arrival_ind' NOT NULL,
           len_of_stay_nr INTEGER TITLE 'len_of_stay_nr' NOT NULL,
           currency_type_cd VARCHAR(20) CHARACTER SET LATIN NOT CASESPECIFIC TITLE 'currency_type_cd' NOT NULL,
           rooms_sold_nr INTEGER TITLE 'rooms_sold_nr' NOT NULL,
           room_revenue_amt DECIMAL(19,4) TITLE 'room_revenue_amt' NOT NULL,
           rooms_occ_nr INTEGER TITLE 'rooms_occ_nr' NOT NULL,
           currency_cd VARCHAR(20) CHARACTER SET LATIN NOT CASESPECIFIC TITLE 'currency_cd',
           batch_id INTEGER TITLE 'batch_id' NOT NULL,
           last_load_ts DATE FORMAT 'yyyy-mm-dd' TITLE 'last_load_ts'
          )

          UNIQUE PRIMARY INDEX XPKF_Pace ( hotel_id ,as_of_calendar_id , stay_calendar_id ,resv_calendar_id ,stay_market_id
                                         ,rate_code_id , channel_id ,company_id ,travel_agent_id ,room_type_id ,arrival_ind
                                         , len_of_stay_nr ,currency_type_cd )

          PARTITION BY CASE_N
             ( as_of_calendar_id BETWEEN 20100101 AND 20100131 ,
               as_of_calendar_id BETWEEN 20100201 AND 20100228 ,
               as_of_calendar_id BETWEEN 20100301 AND 20100331 ,
               as_of_calendar_id BETWEEN 20100401 AND 20100430 ,
               as_of_calendar_id BETWEEN 20100501 AND 20100531 ,
               as_of_calendar_id BETWEEN 20100601 AND 20100630 ,
               as_of_calendar_id BETWEEN 20100701 AND 20100731 ,
               as_of_calendar_id BETWEEN 20100801 AND 20100831 ,
               as_of_calendar_id BETWEEN 20100901 AND 20100930 ,
               as_of_calendar_id BETWEEN 20101001 AND 20101031 ,
               as_of_calendar_id BETWEEN 20101101 AND 20101130 ,
               as_of_calendar_id BETWEEN 20101201 AND 20101231 ,
               as_of_calendar_id BETWEEN 20110101 AND 20110131 ,
               as_of_calendar_id BETWEEN 20110201 AND 20110228 ,
               as_of_calendar_id BETWEEN 20110301 AND 20110331 ,
               as_of_calendar_id BETWEEN 20110401 AND 20110430 ,
               as_of_calendar_id BETWEEN 20110501 AND 20110531 ,
               as_of_calendar_id BETWEEN 20110601 AND 20110630 ,
               as_of_calendar_id BETWEEN 20110701 AND 20110731 ,
               as_of_calendar_id BETWEEN 20110801 AND 20110831 ,
               as_of_calendar_id BETWEEN 20110901 AND 20110930 ,
               as_of_calendar_id BETWEEN 20111001 AND 20111031 ,
               as_of_calendar_id BETWEEN 20111101 AND 20111130 ,
               as_of_calendar_id BETWEEN 20111201 AND 20111231 ,
               as_of_calendar_id BETWEEN 20120101 AND 20120131 ,
               as_of_calendar_id BETWEEN 20120201 AND 20120229 ,
               as_of_calendar_id BETWEEN 20120301 AND 20120331 ,
               as_of_calendar_id BETWEEN 20120401 AND 20120430 ,
               as_of_calendar_id BETWEEN 20120501 AND 20120531 ,
               as_of_calendar_id BETWEEN 20120601 AND 20120630 ,
               as_of_calendar_id BETWEEN 20120701 AND 20120731 ,
               as_of_calendar_id BETWEEN 20120801 AND 20120831 ,
               as_of_calendar_id BETWEEN 20120901 AND 20120930 ,
               as_of_calendar_id BETWEEN 20121001 AND 20121031 ,
               as_of_calendar_id BETWEEN 20121101 AND 20121130 ,
               as_of_calendar_id BETWEEN 20121201 AND 20121231 ,
               as_of_calendar_id BETWEEN 20130101 AND 20130131 ,
               as_of_calendar_id BETWEEN 20130201 AND 20130228 ,
               as_of_calendar_id BETWEEN 20130301 AND 20130331 ,
               as_of_calendar_id BETWEEN 20130401 AND 20130430 ,
               as_of_calendar_id BETWEEN 20130501 AND 20130531 ,
               as_of_calendar_id BETWEEN 20130601 AND 20130630 ,
               as_of_calendar_id BETWEEN 20130701 AND 20130731 ,
               as_of_calendar_id BETWEEN 20130801 AND 20130831 ,
               as_of_calendar_id BETWEEN 20130901 AND 20130930 ,
               as_of_calendar_id BETWEEN 20131001 AND 20131031 ,
               as_of_calendar_id BETWEEN 20131101 AND 20131130 ,
               as_of_calendar_id BETWEEN 20131201 AND 20131231 ,
             NO CASE) INDEX ( hotel_id );

     C302_PRD_FS_STAGE.V_E_OPERA_BATCH_CONTROL DDL

     REPLACE VIEW C302_prd_fs_stage.V_E_Opera_Batch_Control
       AS LOCKING ROW IN ACCESS
     SELECT opera_batch_cd, hotel_cd, last_run_dt, last_pace_dt, hotel_id FROM  C302_PRD_FS_STAGE.E_Opera_Batch_Control

Original Explain

     Explain DELETE                T1
     FROM C302_PRD_FS_EDW.F_PACE T1,
     C302_PRD_FS_STAGE.V_E_OPERA_BATCH_CONTROL T2
     WHERE T1.HOTEL_ID= T2.HOTEL_ID
     AND T1.AS_OF_CALENDAR_ID > CAST(CAST(T2.LAST_PACE_DT  AS DATE FORMAT 'YYYYMMDD' ) AS CHAR( 8 ))
     AND T2.OPERA_BATCH_CD='JAC'
Explanation
--------------------------------------------------
 
  1)First, we lock a distinct C302_PRD_FS_EDW."pseudo table" for write on a RowHash to prevent global deadlock for C302_PRD_FS_EDW.T1.  
  2)Next, we lock C302_PRD_FS_STAGE.E_Opera_Batch_Control in view V_E_OPERA_BATCH_CONTROL for access, and we lock C302_PRD_FS_EDW.T1 for write.  
  3)We do an all-AMPs RETRIEVE step from C302_PRD_FS_STAGE.E_Opera_Batch_Control in view V_E_OPERA_BATCH_CONTROL by way of an all-rows scan with a condition of ("(C302_PRD_FS_STAGE.E_Opera_Batch_Control in view V_E_OPERA_BATCH_CONTROL.opera_batch_cd = 'JAC') AND (NOT (C302_PRD_FS_STAGE.E_Opera_Batch_Control in view V_E_OPERA_BATCH_CONTROL.hotel_id IS NULL ))") into Spool 2 (all_amps), which is duplicated on all AMPs with hash fields ( "C302_PRD_FS_STAGE.E_Opera_Batch_Control.hotel_id"). Then we do a SORT to order Spool 2 by row hash. The size of Spool 2 is estimated with high confidence to be 48 rows (1,008 bytes). Spool Asgnlist: "last_pace_dt" = "last_pace_dt", "hotel_id" = "C302_PRD_FS_STAGE.E_Opera_Batch_Control.hotel_id". The estimated time for this step is 0.01 seconds.  
  4)We do an all-AMPs JOIN step (No Sum) from Spool 2 (Last Use) by way of an all-rows scan, which is joined to C302_PRD_FS_EDW.T1 by way of a traversal of index # 4 without accessing the base table extracting row ids only. Spool 2 and C302_PRD_FS_EDW.T1 are joined using a nested join, with a join condition of ( "C302_PRD_FS_EDW.T1.hotel_id = Spool_2.hotel_id"). The input table C302_PRD_FS_EDW.T1 will not be cached in memory. The result goes into Spool 3 (all_amps), which is built locally on the AMPs with Field1 ("'00000000000000000000'XB"). Then we do a SORT to order Spool 3 by field Id 1. The size of Spool 3 is estimated with low confidence to be 10,904,180 rows (294,412,860 bytes). Spool Asgnlist: "'00000000000000000000'XB", "last_pace_dt" = "last_pace_dt". The estimated time for this step is 4.92 seconds.  
  5)We do an all-AMPs JOIN step (No Sum) from Spool 3 (Last Use) by way of an all-rows scan, which is joined to C302_PRD_FS_EDW.T1 by way of an all-rows scan with no residual conditions. Spool 3 and C302_PRD_FS_EDW.T1 are joined using a row id join, with a join condition of ("(C302_PRD_FS_EDW.T1.as_of_calendar_id )> (TRIM(BOTH FROM last_pace_dt (CHAR(8), CHARACTER SET LATIN, NOT CASESPECIFIC, FORMAT 'YYYYMMDD'))(CHAR(8), CHARACTER SET LATIN, NOT CASESPECIFIC)(FLOAT, FORMAT '-9.99999999999999E-999'))"). The input table C302_PRD_FS_EDW.T1 will not be cached in memory. The result goes into Spool 1 (all_amps), which is built locally on the AMPs with hash fields ("C302_PRD_FS_EDW.T1.ROWID") and Field1 ( "C302_PRD_FS_EDW.T1.ROWID"). Then we do a SORT to partition Spool 1 by rowkey and the sort key in spool field1 eliminating duplicate rows. The size of Spool 1 is estimated with low confidence to be 10,904,180 rows (196,275,240 bytes). Spool Asgnlist: "Field_1" = "C302_PRD_FS_EDW.T1.ROWID". The estimated time for this step is 1 minute and 39 seconds.  
  6)We do an all-AMPs MERGE DELETE to C302_PRD_FS_EDW.T1 from Spool 1 (Last Use) via the row id. The size is estimated with low confidence to be 10,904,180 rows. The estimated time for this step is 17 hours and 36 minutes.  
  7)Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.  
  ->No rows are returned to the user as the result of statement 1. The total estimated time is 17 hours and 38 minutes.  


I dropped the secondary index and below is the explain plan after that.

     Explain DELETE T1
     FROM FS_SANDBOX.F_PACE_PROD T1,
     FS_SANDBOX.V_E_OPERA_BATCH_CONTROL T2
     WHERE T1.HOTEL_ID= T2.HOTEL_ID
     AND T1.AS_OF_CALENDAR_ID > MOD_LAST_PACE_DT AND T2.OPERA_BATCH_CD='JAC'
Explanation
--------------------------------------------------
 
  1)First, we lock a distinct FS_SANDBOX."pseudo table" for write on a RowHash to prevent global deadlock for FS_SANDBOX.T1.  
  2)Next, we lock FS_SANDBOX.T1 for write, and we lock FS_SANDBOX.E_OPERA_BATCH_CONTROL in view V_E_OPERA_BATCH_CONTROL for access.  
  3)We do an all-AMPs RETRIEVE step from FS_SANDBOX.E_OPERA_BATCH_CONTROL in view V_E_OPERA_BATCH_CONTROL by way of an all-rows scan with a condition of ( "(FS_SANDBOX.E_OPERA_BATCH_CONTROL in view V_E_OPERA_BATCH_CONTROL.opera_batch_cd = 'JAC') AND (NOT (FS_SANDBOX.E_OPERA_BATCH_CONTROL in view V_E_OPERA_BATCH_CONTROL.hotel_id IS NULL ))") into Spool 2 (all_amps), which is duplicated on all AMPs with hash fields ( "FS_SANDBOX.E_OPERA_BATCH_CONTROL.hotel_id"). The size of Spool 2 is estimated with high confidence to be 48 rows (1,008 bytes). Spool Asgnlist: "last_pace_dt" = "last_pace_dt", "hotel_id" = "FS_SANDBOX.E_OPERA_BATCH_CONTROL.hotel_id". The estimated time for this step is 0.02 seconds.  
  4)We do an all-AMPs JOIN step (No Sum) from Spool 2 (Last Use) by way of an all-rows scan, which is joined to FS_SANDBOX.T1 by way of an all- rows scan with no residual conditions. Spool 2 and FS_SANDBOX.T1 are joined using a single partition hash_ join, with a join condition of ("(FS_SANDBOX.T1.hotel_id = Spool_2.hotel_id) AND ((FS_SANDBOX.T1.as_of_calendar_id )> (TRIM(BOTH FROM {LeftTable}.last_pace_dt (CHAR(8), CHARACTER SET LATIN, NOT CASESPECIFIC, FORMAT 'YYYYMMDD'))(CHAR(8), CHARACTER SET LATIN, NOT CASESPECIFIC)(FLOAT, FORMAT '-9.99999999999999E- 999')))"). The input table FS_SANDBOX.T1 will not be cached in memory. The result goes into Spool 1 (all_amps), which is built locally on the AMPs with hash fields ("FS_SANDBOX.T1.ROWID") and Field1 ( "FS_SANDBOX.T1.ROWID"). Then we do a SORT to partition Spool 1 by rowkey and the sort key in spool field1 eliminating duplicate rows. The size of Spool 1 is estimated with low confidence to be 10,887,491 rows (195,974,838 bytes). Spool Asgnlist: "Field_1" = "FS_SANDBOX.T1.ROWID". The estimated time for this step is 1 minute and 46 seconds.  
  5)We do an all-AMPs MERGE DELETE to FS_SANDBOX.T1 from Spool 1 (Last Use) via the row id. The size is estimated with low confidence to be 10,887,491 rows. The estimated time for this step is 2 hours and 58 minutes.  
  6)Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.  
  ->No rows are returned to the user as the result of statement 1. The total estimated time is 3 hours.  


Thanks

Roopalini



     
  <Prev Next>  
<<First
<Prev
Next> Last>>  
 
 
 
 
 
 
 
 
  
  Top Home Privacy Feedback  
 
 
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 27 Dec 2016