|
|
Archives of the TeradataForum
Message Posted: Mon, 26 Sep 2005 @ 19:38:24 GMT
Subj: | | Re: Redestributing large tables instead of replicating a small table |
|
From: | | Nomula, Madhukar |
Thnaks for all the replies.
We have 296 amps in our system. The loc_id column on the location table is unique (as of now). It is just not defined as UPI. For some reason
it is defined as NUPI. This table has only two columns (loc_id and location).
The stats on are collected on the primary index column of the large table. This table is defined as multiset, but there are no duplicate
records. LOC_ID is NOT the PI on the large table.
LOC_ID is a character field, which defined as char(20) on large table and char(5) on small table.
I tried creating another small table with loc_id as UPI but the explain plans are still redistributing the large table. The only change is the
number of rows in the spool after product join is 40 Million. I am not sure why it's showing 80 Million with NUPI, even though the stats are
collected.
Even the following query is redistributing the large table. Now I came to the conclusion that redistributing the large table might be cheaper
than replicating the small table (evident from the fact that explain plan is trying to redistribute the small table rather than duplicate it). But
how do I make the optimizer to redistribute the small table, because we are having spool issues. The processing time is not important here as this
query would be part of ETL. Also we have many similar incidents, which are causing spool space issues.
SELECT * FROM
CONTACT C
LEFT JOIN
(
select LOC_ID, LOCATION from LOCATION
) D(LOC_ID, Location)
ON
C.LOC_ID = D.LOC_ID;
Explanation -------------------------------------------------- | |
| 1) | First, we lock a distinct BIGB."pseudo table" for read on a RowHash to prevent global deadlock for BIGB.C.
| |
| 2) | Next, we lock a distinct BIGB."pseudo table" for read on a RowHash to prevent global deadlock for BIGB.LOCATION.
| |
| 3) | We lock BIGB.C for read, and we lock BIGB.LOCATION for read.
| |
| 4) | We execute the following steps in parallel.
| |
| |
| 1) | We do an all-AMPs RETRIEVE step from BIGB.LOCATION by way of an all-rows scan with no residual conditions into Spool 1 (all_amps), which
is redistributed by hash code to all AMPs. The size of Spool 1 is estimated with high confidence to be 2,619 rows. The estimated time for this
step is 0.03 seconds.
| | |
| |
| 2) | We do an all-AMPs RETRIEVE step from BIGB.C by way of an all-rows scan with no residual conditions into Spool 3 (all_amps), which is
redistributed by hash code to all AMPs. The input table will not be cached in memory, but it is eligible for synchronized scanning. The result
spool file will not be cached in memory. The size of Spool 3 is estimated with high confidence to be 40,323,499 rows. The estimated time for
this step is 1 minute and 35 seconds.
| | |
| 5) | We do an all-AMPs RETRIEVE step from Spool 1 (Last Use) by way of an all-rows scan into Spool 4 (all_amps), which is redistributed by hash
code to all AMPs. The size of Spool 4 is estimated with high confidence to be 2,619 rows. The estimated time for this step is 0.03 seconds.
| |
| 6) | We do an all-AMPs JOIN step from Spool 3 (Last Use) by way of an all-rows scan, which is joined to Spool 4 (Last Use) by way of an all-rows
scan. Spool 3 and Spool 4 are left outer joined using a product join, with a join condition of ("LOC_ID = LOC_ID"). The result goes into Spool 2
(group_amps), which is built locally on the AMPs. The result spool file will not be cached in memory. The size of Spool 2 is estimated with no
confidence to be 41,304,142 rows. The estimated time for this step is 58.37 seconds.
| |
| 7) | Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.
| |
| -> | The contents of Spool 2 are sent back to the user as the result of statement 1. The total estimated time is 2 minutes and 34 seconds.
| |
| |