Why redistribution in this select with UNION [From Simard Rudel: Thu, 19 Mar 2009 @ 22:19 GMT]

Message Posted: Thu, 19 Mar 2009 @ 22:19:57 GMT

Subj:

Why redistribution in this select with UNION

From:

Simard Rudel

Hi all,

I have a first table with 500 000 000 records, a second table with 200 000 000 records. I want merge those 2 tables to one table. I use the UNION command to merge those 2 tables. The explain plan gives me a redistribution but those 3 tables have the same NUPI.I sent the explain and the DDL. Also, I look at an explain to do a copy of a table with the same NUPI, and Teradata doesn't redistribute.

I don't understand why Teradata want to do the redistribution with UNION. My guess, Teradata want to be sure to not have dupplicate data.

CREATE SET TABLE ESPA_TRAV_57.TABLE_DUMMY ,NO FALLBACK , NO BEFORE JOURNAL, NO AFTER JOURNAL, CHECKSUM = DEFAULT ( C1 INTEGER, C2 BYTEINT, C3 DATE FORMAT 'YY/MM/DD', C4 DATE FORMAT 'YY/MM/DD') PRIMARY INDEX ( C1 ); CREATE SET TABLE ESPA_TRAV_57.TABLE_DUMMY_1 ,NO FALLBACK , NO BEFORE JOURNAL, NO AFTER JOURNAL, CHECKSUM = DEFAULT ( C1 INTEGER, C2 BYTEINT, C3 DATE FORMAT 'YY/MM/DD', C4 DATE FORMAT 'YY/MM/DD') PRIMARY INDEX ( C1 ); CREATE SET TABLE ESPA_TRAV_57.TABLE_DUMMY_2 ,NO FALLBACK , NO BEFORE JOURNAL, NO AFTER JOURNAL, CHECKSUM = DEFAULT ( C1 INTEGER, C2 BYTEINT, C3 DATE FORMAT 'YY/MM/DD', C4 DATE FORMAT 'YY/MM/DD') PRIMARY INDEX ( C1 ); explain insert into espa_trav_57.Table_Dummy select * from espa_trav_57.Table_Dummy_1 union select * from espa_trav_57.Table_Dummy_2 Explanation 1) First, we lock a distinct espa_trav_57."pseudo table" for read on a RowHash to prevent global deadlock for espa_trav_57.Table_Dummy_2. 2) Next, we lock a distinct espa_trav_57."pseudo table" for read on a RowHash to prevent global deadlock for espa_trav_57.Table_Dummy_1. 3) We lock a distinct espa_trav_57."pseudo table" for write on a RowHash to prevent global deadlock for espa_trav_57.Table_Dummy. 4) We lock espa_trav_57.Table_Dummy_2 for read, we lock espa_trav_57.Table_Dummy_1 for read, and we lock espa_trav_57.Table_Dummy for write. 5) We do an all-AMPs RETRIEVE step from espa_trav_57.Table_Dummy_1 by way of an all-rows scan with no residual conditions into Spool 1 (all_amps), which is redistributed by hash code to all AMPs. The input table will not be cached in memory, but it is eligible for synchronized scanning. The result spool file will not be cached in memory. The size of Spool 1 is estimated with high confidence to be 500,000,000 rows. The estimated time for this step is 138 hours and 53 minutes. 6) We do an all-AMPs RETRIEVE step from espa_trav_57.Table_Dummy_2 by way of an all-rows scan with no residual conditions into Spool 1 (all_amps), which is redistributed by hash code to all AMPs. Then we do a SORT to order Spool 1 by the sort key in spool field1 eliminating duplicate rows. The input table will not be cached in memory, but it is eligible for synchronized scanning. The result spool file will not be cached in memory. The size of Spool 1 is estimated with high confidence to be 350,000,000 rows. The estimated time for this step is 55 hours and 33 minutes. 7) We do an all-AMPs RETRIEVE step from Spool 1 (Last Use) by way of an all-rows scan into Spool 2 (all_amps), which is redistributed by hash code to all AMPs. Then we do a SORT to order Spool 2 by row hash. The result spool file will not be cached in memory. The size of Spool 2 is estimated with high confidence to be 700,000,000 rows. The estimated time for this step is 13 minutes and 8 seconds. 8) We do an all-AMPs MERGE into espa_trav_57.Table_Dummy from Spool 2 (Last Use). 9) We spoil the parser's dictionary cache for the table. 10) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> No rows are returned to the user as the result of statement 1. explain insert into espa_trav_57.Table_Dummy select * from espa_trav_57.Table_Dummy_1 Explanation 1) First, we lock a distinct espa_trav_57."pseudo table" for read on a RowHash to prevent global deadlock for espa_trav_57.Table_Dummy_1. 2) Next, we lock a distinct espa_trav_57."pseudo table" for write on a RowHash to prevent global deadlock for espa_trav_57.Table_Dummy. 3) We lock espa_trav_57.Table_Dummy_1 for read, and we lock espa_trav_57.Table_Dummy for write. 4) We do an all-AMPs MERGE into espa_trav_57.Table_Dummy from espa_trav_57.Table_Dummy_1. 5) We spoil the parser's dictionary cache for the table. 6) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> No rows are returned to the user as the result of statement 1.

Attachments

Library

Quick Reference

Archives of the TeradataForum

Message Posted: Thu, 19 Mar 2009 @ 22:19:57 GMT