|
|
Archives of the TeradataForum
Message Posted: Tue, 10 Sep 2013 @ 15:33:59 GMT
Subj: | | Merge delete performance tuning |
|
From: | | Bakthavachalam, Roopalini |
Hi Forum,
I am trying to tune the below DELETE query for our development team. I did couple things and could bring down the estimated time from 17 hours
to 3 hours. However I am unable to bring it down any further. Could any of you please tell me, if there is anything more I can do ? The stats are
updated for F_PACE table. Though the F_PACE table is partitioned , it is not using any of the partitions because of the condition
T1.AS_OF_CALENDAR_ID > CAST(CAST(T2.LAST_PACE_DT AS DATE FORMAT 'YYYYMMDD' ) AS CHAR( 8 )). I dropped the secondary index and that brought it
down to 3 hours.
F_PACE table has 682 M records and E_OPERA_BATCH_CONTROL has 72 records.
DELETE T1
FROM C302_PRD_FS_EDW.F_PACE T1,
C302_PRD_FS_STAGE.V_E_OPERA_BATCH_CONTROL T2
WHERE T1.HOTEL_ID= T2.HOTEL_ID
AND T1.AS_OF_CALENDAR_ID > CAST(CAST(T2.LAST_PACE_DT AS DATE FORMAT 'YYYYMMDD' ) AS CHAR( 8 ))
AND T2.OPERA_BATCH_CD='JAC'
C302_PRD_FS_EDW.F_PACE DDL
SHOW TABLE
C302_PRD_FS_EDW.F_PACE
CREATE MULTISET TABLE C302_PRD_FS_EDW.F_PACE ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT,
DEFAULT MERGEBLOCKRATIO
(
hotel_id INTEGER TITLE 'hotel_id' NOT NULL,
as_of_calendar_id INTEGER TITLE 'as_of_calendar_id' NOT NULL,
stay_calendar_id INTEGER TITLE 'stay_calendar_id' NOT NULL,
resv_calendar_id INTEGER TITLE 'resv_calendar_id' NOT NULL,
stay_market_id INTEGER TITLE 'stay_market_id' NOT NULL,
rate_code_id INTEGER TITLE 'rate_code_id' NOT NULL,
channel_id INTEGER TITLE 'channel_id' NOT NULL,
company_id INTEGER TITLE 'company_id' NOT NULL,
travel_agent_id INTEGER TITLE 'travel_agent_id' NOT NULL,
room_type_id INTEGER TITLE 'room_type_id' NOT NULL,
arrival_ind CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC TITLE 'arrival_ind' NOT NULL,
len_of_stay_nr INTEGER TITLE 'len_of_stay_nr' NOT NULL,
currency_type_cd VARCHAR(20) CHARACTER SET LATIN NOT CASESPECIFIC TITLE 'currency_type_cd' NOT NULL,
rooms_sold_nr INTEGER TITLE 'rooms_sold_nr' NOT NULL,
room_revenue_amt DECIMAL(19,4) TITLE 'room_revenue_amt' NOT NULL,
rooms_occ_nr INTEGER TITLE 'rooms_occ_nr' NOT NULL,
currency_cd VARCHAR(20) CHARACTER SET LATIN NOT CASESPECIFIC TITLE 'currency_cd',
batch_id INTEGER TITLE 'batch_id' NOT NULL,
last_load_ts DATE FORMAT 'yyyy-mm-dd' TITLE 'last_load_ts'
)
UNIQUE PRIMARY INDEX XPKF_Pace ( hotel_id ,as_of_calendar_id , stay_calendar_id ,resv_calendar_id ,stay_market_id
,rate_code_id , channel_id ,company_id ,travel_agent_id ,room_type_id ,arrival_ind
, len_of_stay_nr ,currency_type_cd )
PARTITION BY CASE_N
( as_of_calendar_id BETWEEN 20100101 AND 20100131 ,
as_of_calendar_id BETWEEN 20100201 AND 20100228 ,
as_of_calendar_id BETWEEN 20100301 AND 20100331 ,
as_of_calendar_id BETWEEN 20100401 AND 20100430 ,
as_of_calendar_id BETWEEN 20100501 AND 20100531 ,
as_of_calendar_id BETWEEN 20100601 AND 20100630 ,
as_of_calendar_id BETWEEN 20100701 AND 20100731 ,
as_of_calendar_id BETWEEN 20100801 AND 20100831 ,
as_of_calendar_id BETWEEN 20100901 AND 20100930 ,
as_of_calendar_id BETWEEN 20101001 AND 20101031 ,
as_of_calendar_id BETWEEN 20101101 AND 20101130 ,
as_of_calendar_id BETWEEN 20101201 AND 20101231 ,
as_of_calendar_id BETWEEN 20110101 AND 20110131 ,
as_of_calendar_id BETWEEN 20110201 AND 20110228 ,
as_of_calendar_id BETWEEN 20110301 AND 20110331 ,
as_of_calendar_id BETWEEN 20110401 AND 20110430 ,
as_of_calendar_id BETWEEN 20110501 AND 20110531 ,
as_of_calendar_id BETWEEN 20110601 AND 20110630 ,
as_of_calendar_id BETWEEN 20110701 AND 20110731 ,
as_of_calendar_id BETWEEN 20110801 AND 20110831 ,
as_of_calendar_id BETWEEN 20110901 AND 20110930 ,
as_of_calendar_id BETWEEN 20111001 AND 20111031 ,
as_of_calendar_id BETWEEN 20111101 AND 20111130 ,
as_of_calendar_id BETWEEN 20111201 AND 20111231 ,
as_of_calendar_id BETWEEN 20120101 AND 20120131 ,
as_of_calendar_id BETWEEN 20120201 AND 20120229 ,
as_of_calendar_id BETWEEN 20120301 AND 20120331 ,
as_of_calendar_id BETWEEN 20120401 AND 20120430 ,
as_of_calendar_id BETWEEN 20120501 AND 20120531 ,
as_of_calendar_id BETWEEN 20120601 AND 20120630 ,
as_of_calendar_id BETWEEN 20120701 AND 20120731 ,
as_of_calendar_id BETWEEN 20120801 AND 20120831 ,
as_of_calendar_id BETWEEN 20120901 AND 20120930 ,
as_of_calendar_id BETWEEN 20121001 AND 20121031 ,
as_of_calendar_id BETWEEN 20121101 AND 20121130 ,
as_of_calendar_id BETWEEN 20121201 AND 20121231 ,
as_of_calendar_id BETWEEN 20130101 AND 20130131 ,
as_of_calendar_id BETWEEN 20130201 AND 20130228 ,
as_of_calendar_id BETWEEN 20130301 AND 20130331 ,
as_of_calendar_id BETWEEN 20130401 AND 20130430 ,
as_of_calendar_id BETWEEN 20130501 AND 20130531 ,
as_of_calendar_id BETWEEN 20130601 AND 20130630 ,
as_of_calendar_id BETWEEN 20130701 AND 20130731 ,
as_of_calendar_id BETWEEN 20130801 AND 20130831 ,
as_of_calendar_id BETWEEN 20130901 AND 20130930 ,
as_of_calendar_id BETWEEN 20131001 AND 20131031 ,
as_of_calendar_id BETWEEN 20131101 AND 20131130 ,
as_of_calendar_id BETWEEN 20131201 AND 20131231 ,
NO CASE) INDEX ( hotel_id );
C302_PRD_FS_STAGE.V_E_OPERA_BATCH_CONTROL DDL
REPLACE VIEW C302_prd_fs_stage.V_E_Opera_Batch_Control
AS LOCKING ROW IN ACCESS
SELECT opera_batch_cd, hotel_cd, last_run_dt, last_pace_dt, hotel_id FROM C302_PRD_FS_STAGE.E_Opera_Batch_Control
Original Explain
Explain DELETE T1
FROM C302_PRD_FS_EDW.F_PACE T1,
C302_PRD_FS_STAGE.V_E_OPERA_BATCH_CONTROL T2
WHERE T1.HOTEL_ID= T2.HOTEL_ID
AND T1.AS_OF_CALENDAR_ID > CAST(CAST(T2.LAST_PACE_DT AS DATE FORMAT 'YYYYMMDD' ) AS CHAR( 8 ))
AND T2.OPERA_BATCH_CD='JAC'
Explanation -------------------------------------------------- | |
| 1) | First, we lock a distinct C302_PRD_FS_EDW."pseudo table" for write on a RowHash to prevent global deadlock for C302_PRD_FS_EDW.T1.
| |
| 2) | Next, we lock C302_PRD_FS_STAGE.E_Opera_Batch_Control in view V_E_OPERA_BATCH_CONTROL for access, and we lock C302_PRD_FS_EDW.T1 for write.
| |
| 3) | We do an all-AMPs RETRIEVE step from C302_PRD_FS_STAGE.E_Opera_Batch_Control in view V_E_OPERA_BATCH_CONTROL by way of an all-rows scan
with a condition of ("(C302_PRD_FS_STAGE.E_Opera_Batch_Control in view V_E_OPERA_BATCH_CONTROL.opera_batch_cd = 'JAC') AND (NOT
(C302_PRD_FS_STAGE.E_Opera_Batch_Control in view V_E_OPERA_BATCH_CONTROL.hotel_id IS NULL ))") into Spool 2 (all_amps), which is duplicated on all
AMPs with hash fields ( "C302_PRD_FS_STAGE.E_Opera_Batch_Control.hotel_id"). Then we do a SORT to order Spool 2 by row hash. The size of Spool 2
is estimated with high confidence to be 48 rows (1,008 bytes). Spool Asgnlist: "last_pace_dt" = "last_pace_dt", "hotel_id" =
"C302_PRD_FS_STAGE.E_Opera_Batch_Control.hotel_id". The estimated time for this step is 0.01 seconds.
| |
| 4) | We do an all-AMPs JOIN step (No Sum) from Spool 2 (Last Use) by way of an all-rows scan, which is joined to C302_PRD_FS_EDW.T1 by way of a
traversal of index # 4 without accessing the base table extracting row ids only. Spool 2 and C302_PRD_FS_EDW.T1 are joined using a nested join,
with a join condition of ( "C302_PRD_FS_EDW.T1.hotel_id = Spool_2.hotel_id"). The input table C302_PRD_FS_EDW.T1 will not be cached in memory.
The result goes into Spool 3 (all_amps), which is built locally on the AMPs with Field1 ("'00000000000000000000'XB"). Then we do a SORT to order
Spool 3 by field Id 1. The size of Spool 3 is estimated with low confidence to be 10,904,180 rows (294,412,860 bytes). Spool Asgnlist:
"'00000000000000000000'XB", "last_pace_dt" = "last_pace_dt". The estimated time for this step is 4.92 seconds.
| |
| 5) | We do an all-AMPs JOIN step (No Sum) from Spool 3 (Last Use) by way of an all-rows scan, which is joined to C302_PRD_FS_EDW.T1 by way of an
all-rows scan with no residual conditions. Spool 3 and C302_PRD_FS_EDW.T1 are joined using a row id join, with a join condition of
("(C302_PRD_FS_EDW.T1.as_of_calendar_id )> (TRIM(BOTH FROM last_pace_dt (CHAR(8), CHARACTER SET LATIN, NOT CASESPECIFIC, FORMAT
'YYYYMMDD'))(CHAR(8), CHARACTER SET LATIN, NOT CASESPECIFIC)(FLOAT, FORMAT '-9.99999999999999E-999'))"). The input table C302_PRD_FS_EDW.T1 will
not be cached in memory. The result goes into Spool 1 (all_amps), which is built locally on the AMPs with hash fields
("C302_PRD_FS_EDW.T1.ROWID") and Field1 ( "C302_PRD_FS_EDW.T1.ROWID"). Then we do a SORT to partition Spool 1 by rowkey and the sort key in spool
field1 eliminating duplicate rows. The size of Spool 1 is estimated with low confidence to be 10,904,180 rows (196,275,240 bytes). Spool
Asgnlist: "Field_1" = "C302_PRD_FS_EDW.T1.ROWID". The estimated time for this step is 1 minute and 39 seconds.
| |
| 6) | We do an all-AMPs MERGE DELETE to C302_PRD_FS_EDW.T1 from Spool 1 (Last Use) via the row id. The size is estimated with low confidence to
be 10,904,180 rows. The estimated time for this step is 17 hours and 36 minutes.
| |
| 7) | Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.
| |
| -> | No rows are returned to the user as the result of statement 1. The total estimated time is 17 hours and 38 minutes.
| |
I dropped the secondary index and below is the explain plan after that.
Explain DELETE T1
FROM FS_SANDBOX.F_PACE_PROD T1,
FS_SANDBOX.V_E_OPERA_BATCH_CONTROL T2
WHERE T1.HOTEL_ID= T2.HOTEL_ID
AND T1.AS_OF_CALENDAR_ID > MOD_LAST_PACE_DT AND T2.OPERA_BATCH_CD='JAC'
Explanation -------------------------------------------------- | |
| 1) | First, we lock a distinct FS_SANDBOX."pseudo table" for write on a RowHash to prevent global deadlock for FS_SANDBOX.T1.
| |
| 2) | Next, we lock FS_SANDBOX.T1 for write, and we lock FS_SANDBOX.E_OPERA_BATCH_CONTROL in view V_E_OPERA_BATCH_CONTROL for access.
| |
| 3) | We do an all-AMPs RETRIEVE step from FS_SANDBOX.E_OPERA_BATCH_CONTROL in view V_E_OPERA_BATCH_CONTROL by way of an all-rows scan with a
condition of ( "(FS_SANDBOX.E_OPERA_BATCH_CONTROL in view V_E_OPERA_BATCH_CONTROL.opera_batch_cd = 'JAC') AND (NOT
(FS_SANDBOX.E_OPERA_BATCH_CONTROL in view V_E_OPERA_BATCH_CONTROL.hotel_id IS NULL ))") into Spool 2 (all_amps), which is duplicated on all AMPs
with hash fields ( "FS_SANDBOX.E_OPERA_BATCH_CONTROL.hotel_id"). The size of Spool 2 is estimated with high confidence to be 48 rows (1,008
bytes). Spool Asgnlist: "last_pace_dt" = "last_pace_dt", "hotel_id" = "FS_SANDBOX.E_OPERA_BATCH_CONTROL.hotel_id". The estimated time for this
step is 0.02 seconds.
| |
| 4) | We do an all-AMPs JOIN step (No Sum) from Spool 2 (Last Use) by way of an all-rows scan, which is joined to FS_SANDBOX.T1 by way of an all-
rows scan with no residual conditions. Spool 2 and FS_SANDBOX.T1 are joined using a single partition hash_ join, with a join condition of
("(FS_SANDBOX.T1.hotel_id = Spool_2.hotel_id) AND ((FS_SANDBOX.T1.as_of_calendar_id )> (TRIM(BOTH FROM {LeftTable}.last_pace_dt (CHAR(8),
CHARACTER SET LATIN, NOT CASESPECIFIC, FORMAT 'YYYYMMDD'))(CHAR(8), CHARACTER SET LATIN, NOT CASESPECIFIC)(FLOAT, FORMAT '-9.99999999999999E-
999')))"). The input table FS_SANDBOX.T1 will not be cached in memory. The result goes into Spool 1 (all_amps), which is built locally on the
AMPs with hash fields ("FS_SANDBOX.T1.ROWID") and Field1 ( "FS_SANDBOX.T1.ROWID"). Then we do a SORT to partition Spool 1 by rowkey and the sort
key in spool field1 eliminating duplicate rows. The size of Spool 1 is estimated with low confidence to be 10,887,491 rows (195,974,838 bytes).
Spool Asgnlist: "Field_1" = "FS_SANDBOX.T1.ROWID". The estimated time for this step is 1 minute and 46 seconds.
| |
| 5) | We do an all-AMPs MERGE DELETE to FS_SANDBOX.T1 from Spool 1 (Last Use) via the row id. The size is estimated with low confidence to be
10,887,491 rows. The estimated time for this step is 2 hours and 58 minutes.
| |
| 6) | Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.
| |
| -> | No rows are returned to the user as the result of statement 1. The total estimated time is 3 hours.
| |
Thanks
Roopalini
| |