Home Page for the TeradataForum
 

Archives of the TeradataForum

Message Posted: Thu, 19 Mar 2009 @ 22:19:57 GMT


     
  <Prev Next>  
<<First
<Prev
Next> Last>>  


Subj:   Why redistribution in this select with UNION
 
From:   Simard Rudel

Hi all,

I have a first table with 500 000 000 records, a second table with 200 000 000 records. I want merge those 2 tables to one table. I use the UNION command to merge those 2 tables. The explain plan gives me a redistribution but those 3 tables have the same NUPI.I sent the explain and the DDL. Also, I look at an explain to do a copy of a table with the same NUPI, and Teradata doesn't redistribute.

I don't understand why Teradata want to do the redistribution with UNION. My guess, Teradata want to be sure to not have dupplicate data.

     CREATE SET TABLE ESPA_TRAV_57.TABLE_DUMMY ,NO FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     CHECKSUM = DEFAULT
     (
     C1 INTEGER,
     C2 BYTEINT,
     C3 DATE FORMAT 'YY/MM/DD',
     C4 DATE FORMAT 'YY/MM/DD')
     PRIMARY INDEX ( C1 );


     CREATE SET TABLE ESPA_TRAV_57.TABLE_DUMMY_1 ,NO FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     CHECKSUM = DEFAULT
     (
     C1 INTEGER,
     C2 BYTEINT,
     C3 DATE FORMAT 'YY/MM/DD',
     C4 DATE FORMAT 'YY/MM/DD')
     PRIMARY INDEX ( C1 );


     CREATE SET TABLE ESPA_TRAV_57.TABLE_DUMMY_2 ,NO FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     CHECKSUM = DEFAULT
     (
     C1 INTEGER,
     C2 BYTEINT,
     C3 DATE FORMAT 'YY/MM/DD',
     C4 DATE FORMAT 'YY/MM/DD')
     PRIMARY INDEX ( C1 );



     explain
     insert into espa_trav_57.Table_Dummy
     select * from espa_trav_57.Table_Dummy_1
     union
     select * from espa_trav_57.Table_Dummy_2

     Explanation
       1) First, we lock a distinct espa_trav_57."pseudo table" for read on
          a RowHash to prevent global deadlock for
          espa_trav_57.Table_Dummy_2.
       2) Next, we lock a distinct espa_trav_57."pseudo table" for read on a
          RowHash to prevent global deadlock for espa_trav_57.Table_Dummy_1.
       3) We lock a distinct espa_trav_57."pseudo table" for write on a
          RowHash to prevent global deadlock for espa_trav_57.Table_Dummy.
       4) We lock espa_trav_57.Table_Dummy_2 for read, we lock
          espa_trav_57.Table_Dummy_1 for read, and we lock
          espa_trav_57.Table_Dummy for write.
       5) We do an all-AMPs RETRIEVE step from espa_trav_57.Table_Dummy_1 by
          way of an all-rows scan with no residual conditions into Spool 1
          (all_amps), which is redistributed by hash code to all AMPs.  The
          input table will not be cached in memory, but it is eligible for
          synchronized scanning.  The result spool file will not be cached
          in memory.  The size of Spool 1 is estimated with high confidence
          to be 500,000,000 rows.  The estimated time for this step is 138
          hours and 53 minutes.
       6) We do an all-AMPs RETRIEVE step from espa_trav_57.Table_Dummy_2 by
          way of an all-rows scan with no residual conditions into Spool 1
          (all_amps), which is redistributed by hash code to all AMPs.  Then
          we do a SORT to order Spool 1 by the sort key in spool field1
          eliminating duplicate rows.  The input table will not be cached in
          memory, but it is eligible for synchronized scanning.  The result
          spool file will not be cached in memory.  The size of Spool 1 is
          estimated with high confidence to be 350,000,000 rows.  The
          estimated time for this step is 55 hours and 33 minutes.
       7) We do an all-AMPs RETRIEVE step from Spool 1 (Last Use) by way of
          an all-rows scan into Spool 2 (all_amps), which is redistributed
          by hash code to all AMPs.  Then we do a SORT to order Spool 2 by
          row hash.  The result spool file will not be cached in memory.
          The size of Spool 2 is estimated with high confidence to be
          700,000,000 rows.  The estimated time for this step is 13 minutes
          and 8 seconds.
       8) We do an all-AMPs MERGE into espa_trav_57.Table_Dummy from Spool 2
          (Last Use).
       9) We spoil the parser's dictionary cache for the table.
      10) Finally, we send out an END TRANSACTION step to all AMPs involved
          in processing the request.
       -> No rows are returned to the user as the result of statement 1.



     explain
     insert into espa_trav_57.Table_Dummy
     select * from espa_trav_57.Table_Dummy_1

     Explanation
       1) First, we lock a distinct espa_trav_57."pseudo table" for read on
          a RowHash to prevent global deadlock for
          espa_trav_57.Table_Dummy_1.
       2) Next, we lock a distinct espa_trav_57."pseudo table" for write on
          a RowHash to prevent global deadlock for espa_trav_57.Table_Dummy.
       3) We lock espa_trav_57.Table_Dummy_1 for read, and we lock
          espa_trav_57.Table_Dummy for write.
       4) We do an all-AMPs MERGE into espa_trav_57.Table_Dummy from
          espa_trav_57.Table_Dummy_1.
       5) We spoil the parser's dictionary cache for the table.
       6) Finally, we send out an END TRANSACTION step to all AMPs involved
          in processing the request.
       -> No rows are returned to the user as the result of statement 1.


     
  <Prev Next>  
<<First
<Prev
Next> Last>>  
 
 
 
 
 
 
 
 
  
  Top Home Privacy Feedback  
 
 
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 15 Jun 2023