Archives of the TeradataForum

Message Posted: Mon, 17 Feb 2003 @ 19:43:05 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		INSERT/SELECT out of one table into two

From:		Frank C. Martinez IV

Ok everyone, I've got another dumb question, When you do a INS/SEL operation from two+ tables with the same structure, same PI as the resulting table, with the resulting table starting out empty, you get a very fast load. This works if you do a multistatement request (or if you use a UNION, I've heard. Ok, so I'm old-fashioned!). Now, why doesn't that happen when you extract from one table into two+ tables, same structure, same PI, or at least why doesn't the explain show the extraction happenning at the same time, so that synch scanning can be accomplished. For example:

create table stg_y_1 as stg_yopenorder_hist with no data;
create table stg_y_2 as stg_yopenorder_hist with no data;

EXPLAIN
 INSERT stg_y_1
 SELECT *
   FROM stg_yopenorder_hist
  WHERE CAST(SUBSTR(EXTRACT_TS, 1, 10) AS DATE) < '2003-01-10'
;INSERT stg_y_2
 SELECT *
   FROM stg_yopenorder_hist
  WHERE CAST(SUBSTR(EXTRACT_TS, 1, 10) AS DATE) >= '2003-01-10';

with the explain:

1) First, we lock a distinct GDYR_HIST."pseudo table" for write on a RowHash to prevent global deadlock for GDYR_HIST.stg_y_2. 2) Next, we lock a distinct GDYR_HIST."pseudo table" for write on a RowHash to prevent global deadlock for GDYR_HIST.stg_y_1. 3) We lock a distinct GDYR_HIST."pseudo table" for read on a RowHash to prevent global deadlock for GDYR_HIST.stg_yopenorder_hist. 4) We lock GDYR_HIST.stg_y_2 for write, we lock GDYR_HIST.stg_y_1 for write, and we lock GDYR_HIST.stg_yopenorder_hist for read. 5) We do an all-AMPs RETRIEVE step from GDYR_HIST.stg_yopenorder_hist by way of an all-rows scan with a condition of ("(SUBSTR(GDYR_HIST.stg_yopenorder_hist.EXTRACT_TS ,1 ,10 )(DATE, FORMAT 'yyyy-mm-dd')LATIN)< DATE '2003-01-10'") into Spool 1, which is built locally on the AMPs. Then we do a SORT to order Spool 1 by row hash. The input table will not be cached in memory, but it is eligible for synchronized scanning. The result spool file will not be cached in memory. The size of Spool 1 is estimated with no confidence to be 9,031,152 rows. The estimated time for this step is 2 hours and 29 minutes. 6) We execute the following steps in parallel. - 1) We do a MERGE into GDYR_HIST.stg_y_1 from Spool 1 (Last Use). - 2) We do an all-AMPs RETRIEVE step from GDYR_HIST.stg_yopenorder_hist by way of an all-rows scan with a condition of ("(SUBSTR(GDYR_HIST.stg_yopenorder_hist.EXTRACT_TS ,1 ,10)(DATE, FORMAT 'yyyy-mm-dd')LATIN)>= DATE '2003-01-10'") into Spool 2, which is built locally on the AMPs. Then we do a SORT to order Spool 2 by row hash. The input table will not be cached in memory, but it is eligible for synchronized scanning. The result spool file will not be cached in memory. The size of Spool 2 is estimated with no confidence to be 9,031,152 rows. The estimated time for this step is 2 hours and 29 minutes. 7) We do a MERGE into GDYR_HIST.stg_y_2 from Spool 2 (Last Use). 8) We spoil the parser's dictionary cache for the table. 9) We spoil the parser's dictionary cache for the table. 10) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.

`Explanation --------------------------------------------------`
	`->`	`No rows are returned to the user as the result of statement 1. No rows are returned to the user as the result of statement 2.`

Steps #5 and #6.2 are doing an all-amps retrieve, building the spool files locally. Both steps suggest that synch scan is an option, but the explain puts the retrieve at different times. Is the only way to achieve a synch scan to start them up as different processes at the same time? Ok, somebody klonk me on the head and straighten me out on this one.


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Attachments

Library

Quick Reference