Re: Redestributing large tables instead of replicating a small table [From Nomula, Madhukar: Mon, 26 Sep 2005 @ 19:38 GMT]

Message Posted: Mon, 26 Sep 2005 @ 19:38:24 GMT

Subj:

Re: Redestributing large tables instead of replicating a small table

From:

Nomula, Madhukar

Thnaks for all the replies.

We have 296 amps in our system. The loc_id column on the location table is unique (as of now). It is just not defined as UPI. For some reason it is defined as NUPI. This table has only two columns (loc_id and location).

The stats on are collected on the primary index column of the large table. This table is defined as multiset, but there are no duplicate records. LOC_ID is NOT the PI on the large table.

LOC_ID is a character field, which defined as char(20) on large table and char(5) on small table.

I tried creating another small table with loc_id as UPI but the explain plans are still redistributing the large table. The only change is the number of rows in the spool after product join is 40 Million. I am not sure why it's showing 80 Million with NUPI, even though the stats are collected.

Even the following query is redistributing the large table. Now I came to the conclusion that redistributing the large table might be cheaper than replicating the small table (evident from the fact that explain plan is trying to redistribute the small table rather than duplicate it). But how do I make the optimizer to redistribute the small table, because we are having spool issues. The processing time is not important here as this query would be part of ETL. Also we have many similar incidents, which are causing spool space issues.

SELECT * FROM CONTACT C LEFT JOIN ( select LOC_ID, LOCATION from LOCATION ) D(LOC_ID, Location) ON C.LOC_ID = D.LOC_ID;

Explanation --------------------------------------------------

1)

First, we lock a distinct BIGB."pseudo table" for read on a RowHash to prevent global deadlock for BIGB.C.

2)

Next, we lock a distinct BIGB."pseudo table" for read on a RowHash to prevent global deadlock for BIGB.LOCATION.

3)

We lock BIGB.C for read, and we lock BIGB.LOCATION for read.

4)

We execute the following steps in parallel.

1) We do an all-AMPs RETRIEVE step from BIGB.LOCATION by way of an all-rows scan with no residual conditions into Spool 1 (all_amps), which is redistributed by hash code to all AMPs. The size of Spool 1 is estimated with high confidence to be 2,619 rows. The estimated time for this step is 0.03 seconds.

2) We do an all-AMPs RETRIEVE step from BIGB.C by way of an all-rows scan with no residual conditions into Spool 3 (all_amps), which is redistributed by hash code to all AMPs. The input table will not be cached in memory, but it is eligible for synchronized scanning. The result spool file will not be cached in memory. The size of Spool 3 is estimated with high confidence to be 40,323,499 rows. The estimated time for this step is 1 minute and 35 seconds.

5)

We do an all-AMPs RETRIEVE step from Spool 1 (Last Use) by way of an all-rows scan into Spool 4 (all_amps), which is redistributed by hash code to all AMPs. The size of Spool 4 is estimated with high confidence to be 2,619 rows. The estimated time for this step is 0.03 seconds.

6)

We do an all-AMPs JOIN step from Spool 3 (Last Use) by way of an all-rows scan, which is joined to Spool 4 (Last Use) by way of an all-rows scan. Spool 3 and Spool 4 are left outer joined using a product join, with a join condition of ("LOC_ID = LOC_ID"). The result goes into Spool 2 (group_amps), which is built locally on the AMPs. The result spool file will not be cached in memory. The size of Spool 2 is estimated with no confidence to be 41,304,142 rows. The estimated time for this step is 58.37 seconds.

7)

Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.

->

The contents of Spool 2 are sent back to the user as the result of statement 1. The total estimated time is 2 minutes and 34 seconds.

Attachments

Library

Quick Reference

Archives of the TeradataForum

Message Posted: Mon, 26 Sep 2005 @ 19:38:24 GMT