Home Page for the TeradataForum
 

Archives of the TeradataForum

Message Posted: Mon, 26 Sep 2005 @ 15:06:28 GMT


     
  <Prev Next>  
<<First
<Prev
Next> Last>>  


Subj:   Redestributing large tables instead of replicating a small table
 
From:   Nomula, Madhukar

I know this question is asked before, but I did not see the answer.

I am trying to join a large 40 million row table with a small 2600 row table. But the explain plan is re-distributing the large table instead replicating the small table on all amps.

Here is the query:

     SELECT * FROM
     CONTACT C
     LEFT JOIN
     LOCATION D
     ON
       C.LOC_ID = D.LOC_ID;

The LOC_ID is the NUPI on LOCATION table. The stats are collected on both columns (C.LOC_ID and D.LOC_ID), which can be seen with high confidence of the optimizer. But still here is explain plan. Isn't it cheaper to replicate the small table.

Explanation
--------------------------------------------------
 
  1)First, we lock a distinct BIGB."pseudo table" for read on a RowHash to prevent global deadlock for BIGB.C.  
  2)Next, we lock a distinct BIGB."pseudo table" for read on a RowHash to prevent global deadlock for BIGB.D.  
  3)We lock BIGB.C for read, and we lock BIGB.D for read.  
  4)We do an all-AMPs RETRIEVE step from BIGB.C by way of an all-rows scan with no residual conditions into Spool 2 (all_amps), which is redistributed by hash code to all AMPs. The input table will not be cached in memory, but it is eligible for synchronized scanning. The result spool file will not be cached in memory. The size of Spool 2 is estimated with high confidence to be 40,323,499 rows. The estimated time for this step is 1 minute and 35 seconds.  
  5)We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an all-rows scan, which is joined to BIGB.D by way of an all-rows scan with no residual conditions. Spool 2 and BIGB.D are left outer joined using a product join, with a join condition of ("LOC_ID = BIGB.D.LOC_ID"). The result goes into Spool 1 (group_amps), which is built locally on the AMPs. The result spool file will not be cached in memory. The size of Spool 1 is estimated with index join confidence to be 80,644,379 rows. The estimated time for this step is 1 minute and 32 seconds.  
  6)Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.  
  ->The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 3 minutes and 7 seconds.  


Thanks,

Madhukar.



     
  <Prev Next>  
<<First
<Prev
Next> Last>>  
 
 
 
 
 
 
 
 
  
  Top Home Privacy Feedback  
 
 
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 27 Dec 2016