|
|
Archives of the TeradataForum
Message Posted: Tue, 14 Jul 2015 @ 19:14:34 GMT
Subj: | | Update query running slow |
|
From: | | Bakthavachalam, Roopalini |
Forum - The below query is running for 90 minutes on a 22 nodes 540 AMP Teradata system. I am trying to bring down the execution time with no
luck.
Is there anything that I can try to reduce the execution time?
Here is the query with explain
UPDATE A
FROM TEST.TABLE1 AS A
, (SELECT DISTINCT COL1,COL2,START_DATE
, END_DATE,PLCFLAG
FROM TEST.TABLE2
) AS B
SET PLACE_FLAG_CD_old = B.PLCFLAG
WHERE A.COL1 = B.COL1
AND A.COL2 = B.COL2
AND A.START_DATE = B.START_DATE
AND A.END_DATE = B.END_DATE;
TABLE1 and TABLE2 have 5.5 Billion records.
COL1 is the NPI on both table1 and tabl2.
START_DATE is PPI on both table1 and tabl2.
Stats are collected on all the join columns of both the tables. I still don't understand why I see low confidence though.
Also grouping by COL1,COL2,START_DATE,END_DATE gives duplicate entries and table 1 has PPI defined on another column not used in this query. So
I am unable to use MERGE UPDATE.
The EXPLAIN is using sliding window merge join despite PPI created on the START_DATE on both the tables. Is there any way , I can avoid the
sliding window merge join?
Explain UPDATE A
FROM TEST.TABLE1 AS A
, (SELECT DISTINCT /*PLAN_CD,*/COL1,COL2,START_DATE
, END_DATE,PLCFLAG
FROM TEST.TABLE2
) AS B
SET PLACE_FLAG_CD_old = B.PLCFLAG
WHERE A.COL1 = B.COL1
AND A.COL2 = B.COL2
/*AND A.PLAN_CD = B.PLAN_CD*/
AND A.START_DATE = B.START_DATE
AND A.END_DATE = B.END_DATE;
Explanation -------------------------------------------------- | |
| 1) | First, we lock a distinct TEST."pseudo table" for read on a RowHash to prevent global deadlock for TEST.TABLE2.
| |
| 2) | Next, we lock a distinct TEST."pseudo table" for write on a RowHash to prevent global deadlock for TEST.TABLE1.
| |
| 3) | We lock TEST.TABLE2 for read, and we lock TEST.TABLE1 for write.
| |
| 4) | We do an all-AMPs SUM step to aggregate from TEST.TABLE2 by way of an all-rows scan with a condition of ("(NOT (TEST.TABLE2.START_DATE IS
NULL )) AND ((NOT (TEST.TABLE2.END_DATE IS NULL )) AND ((NOT (TEST.TABLE2.COL2 IS NULL )) AND (NOT (TEST.TABLE2.COL1 IS NULL ))))"), and the
grouping identifier in field 1. Aggregate Intermediate Results are computed locally, then placed in Spool 3. The size of Spool 3 is estimated
with low confidence to be 2,560,714,950 rows (245,828,635,200 bytes). The estimated time for this step is 20 minutes and 18 seconds.
| |
| 5) | We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of an all-rows scan into Spool 1 (used to materialize view, derived table or
table function B) (all_amps) (compressed columns allowed), which is built locally on the AMPs with Field1 ("UniqueId"). The size of Spool 1 is
estimated with low confidence to be 2,560,714,950 rows (128,035,747,500 bytes). Spool AsgnList: "UniqueId", "COL1", "COL2", "START_DATE",
"END_DATE", "plcFlag". The estimated time for this step is 1 minute and 17 seconds.
| |
| 6) | We do an all-AMPs RETRIEVE step from Spool 1 (Last Use) by way of an all-rows scan into Spool 6 (all_amps) (compressed columns allowed),
which is redistributed by hash code to all AMPs with hash fields ("Spool_1.COL1"). Then we do a SORT to order Spool 6 by row hash. The size of
Spool 6 is estimated with low confidence to be 2,560,714,950 rows (107,550,027,900 bytes). Spool AsgnList: "COL1" = "Spool_1.COL1", "COL2" =
"COL2", "START_DATE" = "START_DATE", "END_DATE" = "END_DATE", "PLCFLAG" = "PLCFLAG". The estimated time for this step is 4 minutes and 9 seconds.
| |
| 7) | We do an all-AMPs JOIN step (No Sum) from TEST.TABLE1 by way of a RowHash match scan, which is joined to Spool 6 (Last Use) by way of a
RowHash match scan. TEST.TABLE1 and Spool 6 are joined using a sliding-window merge join, with a join condition of ("(TEST.TABLE1.COL1 =
Spool_6.COL1) AND ((TEST.TABLE1.COL2 = Spool_6.COL2) AND ((TEST.TABLE1.START_DATE = Spool_6.START_DATE) AND (TEST.TABLE1.END_DATE =
Spool_6.END_DATE )))"). The input table TEST.TABLE1 will not be cached in memory, but it is eligible for synchronized scanning. The result goes
into Spool 5 (all_amps), which is built locally on the AMPs with hash fields ("TEST.TABLE1.ROWID") and Field1 ("TEST.TABLE1.ROWID"). Then we do a
SORT to order Spool 5 by the sort key in spool field1. The size of Spool 5 is estimated with low confidence to be 2,560,714,950 rows
(53,775,013,950 bytes). Spool AsgnList: "Field_1" = "TEST.TABLE1.ROWID", "PLCFLAG" = "{RightTable}.PLCFLAG". The estimated time for this step is
26 minutes and 58 seconds.
| |
| 8) | We do a MERGE Update to TEST.TABLE1 from Spool 5 (Last Use) via ROWID. The size is estimated with low confidence to be 2,560,714,950 rows
(2,750,207,856,300 bytes). The estimated time for this step is 92 hours and 4 minutes.
| |
| 9) | Finally, we send out an END TRANSACTION step to all AMPs involved n processing the request.
No rows are returned to the user as the result of statement 1.
The total estimated time is 92 hours and 57 minutes.
BEGIN RECOMMENDED STATS ->
| |
| 10) | "COLLECT STATISTICS COLUMN (COL1 ,COL2,START_DATE , END_DATE) ON TEST.TABLE2". (LowConf)
| |
| 11) | "COLLECT STATISTICS COLUMN (COL1 ,COL2 ,END_DATE, START_DATE) ON TEST.TABLE1". (LowConf) <- END RECOMMENDED STATS
| |
Thanks
Roopalini
| |