Redestributing large tables instead of replicating a small table [From Nomula, Madhukar: Mon, 26 Sep 2005 @ 15:06 GMT]

Message Posted: Mon, 26 Sep 2005 @ 15:06:28 GMT

Subj:

Redestributing large tables instead of replicating a small table

From:

Nomula, Madhukar

I know this question is asked before, but I did not see the answer.

I am trying to join a large 40 million row table with a small 2600 row table. But the explain plan is re-distributing the large table instead replicating the small table on all amps.

Here is the query:

SELECT * FROM CONTACT C LEFT JOIN LOCATION D ON C.LOC_ID = D.LOC_ID;

The LOC_ID is the NUPI on LOCATION table. The stats are collected on both columns (C.LOC_ID and D.LOC_ID), which can be seen with high confidence of the optimizer. But still here is explain plan. Isn't it cheaper to replicate the small table.

Explanation --------------------------------------------------

1)

First, we lock a distinct BIGB."pseudo table" for read on a RowHash to prevent global deadlock for BIGB.C.

2)

Next, we lock a distinct BIGB."pseudo table" for read on a RowHash to prevent global deadlock for BIGB.D.

3)

We lock BIGB.C for read, and we lock BIGB.D for read.

4)

We do an all-AMPs RETRIEVE step from BIGB.C by way of an all-rows scan with no residual conditions into Spool 2 (all_amps), which is redistributed by hash code to all AMPs. The input table will not be cached in memory, but it is eligible for synchronized scanning. The result spool file will not be cached in memory. The size of Spool 2 is estimated with high confidence to be 40,323,499 rows. The estimated time for this step is 1 minute and 35 seconds.

5)

We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an all-rows scan, which is joined to BIGB.D by way of an all-rows scan with no residual conditions. Spool 2 and BIGB.D are left outer joined using a product join, with a join condition of ("LOC_ID = BIGB.D.LOC_ID"). The result goes into Spool 1 (group_amps), which is built locally on the AMPs. The result spool file will not be cached in memory. The size of Spool 1 is estimated with index join confidence to be 80,644,379 rows. The estimated time for this step is 1 minute and 32 seconds.

6)

Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.

->

The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 3 minutes and 7 seconds.

Thanks,

Madhukar.

Attachments

Library

Quick Reference

Archives of the TeradataForum

Message Posted: Mon, 26 Sep 2005 @ 15:06:28 GMT