|
|
Archives of the TeradataForum
Message Posted: Mon, 17 Feb 2003 @ 19:43:05 GMT
Subj: | | INSERT/SELECT out of one table into two |
|
From: | | Frank C. Martinez IV |
Ok everyone, I've got another dumb question, When you do a INS/SEL operation from two+ tables with the same structure, same PI as the
resulting table, with the resulting table starting out empty, you get a very fast load. This works if you do a multistatement request (or
if you use a UNION, I've heard. Ok, so I'm old-fashioned!). Now, why doesn't that happen when you extract from one table into two+ tables,
same structure, same PI, or at least why doesn't the explain show the extraction happenning at the same time, so that synch scanning can be
accomplished. For example:
create table stg_y_1 as stg_yopenorder_hist with no data;
create table stg_y_2 as stg_yopenorder_hist with no data;
EXPLAIN
INSERT stg_y_1
SELECT *
FROM stg_yopenorder_hist
WHERE CAST(SUBSTR(EXTRACT_TS, 1, 10) AS DATE) < '2003-01-10'
;INSERT stg_y_2
SELECT *
FROM stg_yopenorder_hist
WHERE CAST(SUBSTR(EXTRACT_TS, 1, 10) AS DATE) >= '2003-01-10';
with the explain:
Explanation -------------------------------------------------- | | 1) First, we lock a distinct GDYR_HIST."pseudo table" for write on a RowHash to prevent global deadlock for GDYR_HIST.stg_y_2.
2) Next, we lock a distinct GDYR_HIST."pseudo table" for write on a RowHash to prevent global deadlock for GDYR_HIST.stg_y_1.
3) We lock a distinct GDYR_HIST."pseudo table" for read on a RowHash to prevent global deadlock for GDYR_HIST.stg_yopenorder_hist.
4) We lock GDYR_HIST.stg_y_2 for write, we lock GDYR_HIST.stg_y_1 for write, and we lock GDYR_HIST.stg_yopenorder_hist for read.
5) We do an all-AMPs RETRIEVE step from GDYR_HIST.stg_yopenorder_hist by way of an all-rows scan with a condition of
("(SUBSTR(GDYR_HIST.stg_yopenorder_hist.EXTRACT_TS ,1 ,10 )(DATE, FORMAT 'yyyy-mm-dd')LATIN)< DATE '2003-01-10'") into Spool 1, which is
built locally on the AMPs. Then we do a SORT to order Spool 1 by row hash. The input table will not be cached in memory, but it is
eligible for synchronized scanning. The result spool file will not be cached in memory. The size of Spool 1 is estimated with no
confidence to be 9,031,152 rows. The estimated time for this step is 2 hours and 29 minutes.
6) We execute the following steps in parallel.
- 1) We do a MERGE into GDYR_HIST.stg_y_1 from Spool 1 (Last Use).
- 2) We do an all-AMPs RETRIEVE step from GDYR_HIST.stg_yopenorder_hist by way of an all-rows scan with a condition of
("(SUBSTR(GDYR_HIST.stg_yopenorder_hist.EXTRACT_TS ,1 ,10)(DATE, FORMAT 'yyyy-mm-dd')LATIN)>= DATE '2003-01-10'") into Spool 2, which is
built locally on the AMPs. Then we do a SORT to order Spool 2 by row hash. The input table will not be cached in memory, but it is
eligible for synchronized scanning. The result spool file will not be cached in memory. The size of Spool 2 is estimated with no
confidence to be 9,031,152 rows. The estimated time for this step is 2 hours and 29 minutes.
7) We do a MERGE into GDYR_HIST.stg_y_2 from Spool 2 (Last Use).
8) We spoil the parser's dictionary cache for the table.
9) We spoil the parser's dictionary cache for the table.
10) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.
| -> | No rows are returned to the user as the result of statement 1. No rows are returned to the user as the result of statement 2.
| |
Steps #5 and #6.2 are doing an all-amps retrieve, building the spool files locally. Both steps suggest that synch scan is an
option, but the explain puts the retrieve at different times. Is the only way to achieve a synch scan to start them up as different
processes at the same time? Ok, somebody klonk me on the head and straighten me out on this one.
iv
| |