|
|
Archives of the TeradataForum
Message Posted: Thu, 19 Mar 2009 @ 22:19:57 GMT
Subj: | | Why redistribution in this select with UNION |
|
From: | | Simard Rudel |
Hi all,
I have a first table with 500 000 000 records, a second table with 200 000 000 records. I want merge those 2 tables to one table. I use the
UNION command to merge those 2 tables. The explain plan gives me a redistribution but those 3 tables have the same NUPI.I sent the explain and the
DDL. Also, I look at an explain to do a copy of a table with the same NUPI, and Teradata doesn't redistribute.
I don't understand why Teradata want to do the redistribution with UNION. My guess, Teradata want to be sure to not have dupplicate data.
CREATE SET TABLE ESPA_TRAV_57.TABLE_DUMMY ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT
(
C1 INTEGER,
C2 BYTEINT,
C3 DATE FORMAT 'YY/MM/DD',
C4 DATE FORMAT 'YY/MM/DD')
PRIMARY INDEX ( C1 );
CREATE SET TABLE ESPA_TRAV_57.TABLE_DUMMY_1 ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT
(
C1 INTEGER,
C2 BYTEINT,
C3 DATE FORMAT 'YY/MM/DD',
C4 DATE FORMAT 'YY/MM/DD')
PRIMARY INDEX ( C1 );
CREATE SET TABLE ESPA_TRAV_57.TABLE_DUMMY_2 ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT
(
C1 INTEGER,
C2 BYTEINT,
C3 DATE FORMAT 'YY/MM/DD',
C4 DATE FORMAT 'YY/MM/DD')
PRIMARY INDEX ( C1 );
explain
insert into espa_trav_57.Table_Dummy
select * from espa_trav_57.Table_Dummy_1
union
select * from espa_trav_57.Table_Dummy_2
Explanation
1) First, we lock a distinct espa_trav_57."pseudo table" for read on
a RowHash to prevent global deadlock for
espa_trav_57.Table_Dummy_2.
2) Next, we lock a distinct espa_trav_57."pseudo table" for read on a
RowHash to prevent global deadlock for espa_trav_57.Table_Dummy_1.
3) We lock a distinct espa_trav_57."pseudo table" for write on a
RowHash to prevent global deadlock for espa_trav_57.Table_Dummy.
4) We lock espa_trav_57.Table_Dummy_2 for read, we lock
espa_trav_57.Table_Dummy_1 for read, and we lock
espa_trav_57.Table_Dummy for write.
5) We do an all-AMPs RETRIEVE step from espa_trav_57.Table_Dummy_1 by
way of an all-rows scan with no residual conditions into Spool 1
(all_amps), which is redistributed by hash code to all AMPs. The
input table will not be cached in memory, but it is eligible for
synchronized scanning. The result spool file will not be cached
in memory. The size of Spool 1 is estimated with high confidence
to be 500,000,000 rows. The estimated time for this step is 138
hours and 53 minutes.
6) We do an all-AMPs RETRIEVE step from espa_trav_57.Table_Dummy_2 by
way of an all-rows scan with no residual conditions into Spool 1
(all_amps), which is redistributed by hash code to all AMPs. Then
we do a SORT to order Spool 1 by the sort key in spool field1
eliminating duplicate rows. The input table will not be cached in
memory, but it is eligible for synchronized scanning. The result
spool file will not be cached in memory. The size of Spool 1 is
estimated with high confidence to be 350,000,000 rows. The
estimated time for this step is 55 hours and 33 minutes.
7) We do an all-AMPs RETRIEVE step from Spool 1 (Last Use) by way of
an all-rows scan into Spool 2 (all_amps), which is redistributed
by hash code to all AMPs. Then we do a SORT to order Spool 2 by
row hash. The result spool file will not be cached in memory.
The size of Spool 2 is estimated with high confidence to be
700,000,000 rows. The estimated time for this step is 13 minutes
and 8 seconds.
8) We do an all-AMPs MERGE into espa_trav_57.Table_Dummy from Spool 2
(Last Use).
9) We spoil the parser's dictionary cache for the table.
10) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> No rows are returned to the user as the result of statement 1.
explain
insert into espa_trav_57.Table_Dummy
select * from espa_trav_57.Table_Dummy_1
Explanation
1) First, we lock a distinct espa_trav_57."pseudo table" for read on
a RowHash to prevent global deadlock for
espa_trav_57.Table_Dummy_1.
2) Next, we lock a distinct espa_trav_57."pseudo table" for write on
a RowHash to prevent global deadlock for espa_trav_57.Table_Dummy.
3) We lock espa_trav_57.Table_Dummy_1 for read, and we lock
espa_trav_57.Table_Dummy for write.
4) We do an all-AMPs MERGE into espa_trav_57.Table_Dummy from
espa_trav_57.Table_Dummy_1.
5) We spoil the parser's dictionary cache for the table.
6) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> No rows are returned to the user as the result of statement 1.
| |