Home Page for the TeradataForum
 

Archives of the TeradataForum

Message Posted: Thu, 18 Jul 2002 @ 14:57:40 GMT


     
  <Prev Next>   <<First <Prev Next> Last>>  


Subj:   Re: The Optimizer and Multi Inlists
 
From:   David Wellman

Why not try building your list of values in a derived table and then joining the derived table to the table with the multi-part PI. I tried the following on a table (vedcuser04) which has two columns (col01 and col03) defined as the PI. My requiredment is to find rows where col01 in ('david','fred') and col04 = 1000102.

I coded it as:

sel *
from vedcuser04
where (col01,col03)
in (select 'david' (char(30) ,character set latin ),1000102 (date)
     from sys_calendar.calendar
     where  calendar_date=date
     union all
    select 'fred' (char(30) ,character set latin ),1000102 (date)
     from sys_calendar.calendar
     where  calendar_date=date);

The explain shows that the table with the multi-part PI (vedcuser04) is accessed last and is accessed using a merge join on the PI columns (Step 7). Note that table vedcuser04 is only @4000 rows on a 20-amp system so even with a 'small' table you get the right join plan:

Explanation
--------------------------------------------------
 
  1)First, we lock a distinct VEDB4NODE."pseudo table" for read on a RowHash to prevent global deadlock for VEDB4NODE.vedcuser04.  
  2)Next, we lock VEDB4NODE.vedcuser04 for read.  
  3)We do a single-AMP RETRIEVE step from SYS_CALENDAR.CALDATES by way of the unique primary index "SYS_CALENDAR.CALDATES.cdate = DATE" with no residual conditions into Spool 1, which is built locally on that AMP. The size of Spool 1 is estimated with high confidence to be 1 row. The estimated time for this step is 0.15 seconds.  
  4)We do a single-AMP RETRIEVE step from SYS_CALENDAR.CALDATES by way of the unique primary index "SYS_CALENDAR.CALDATES.cdate = DATE" with no residual conditions into Spool 1, which is built locally on that AMP. The size of Spool 1 is estimated with high confidence to be 2 rows. The estimated time for this step is 0.15 seconds.  
  5)We do an all-AMPs RETRIEVE step from Spool 1 (Last Use) by way of an all-rows scan into Spool 3, which is redistributed by hash code to all AMPs. The size of Spool 3 is estimated with high confidence to be 2 rows. The estimated time for this step is 0.17 seconds.  
  6)We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of an all-rows scan into Spool 4, which is built locally on the AMPs. Then we do a SORT to order Spool 4 by row hash. The size of Spool 4 is estimated with high confidence to be 1 row. The estimated time for this step is 0.04 seconds.  
  7)We do an all-AMPs JOIN step from VEDB4NODE.vedcuser04 by way of an all-rows scan with no residual conditions, which is joined to Spool 4 (Last Use). VEDB4NODE.vedcuser04 and Spool 4 are joined using an inclusion merge join, with a join condition of ("(VEDB4NODE.vedcuser04.COL01 =) AND (VEDB4NODE.vedcuser04.COL03=)"). The result goes into Spool 2, which is built locally on the AMPs. The size of Spool 2 is estimated with low confidence to be 2 rows. The estimated time for this step is 0.20 seconds.  
  8)Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.  
  ->The contents of Spool 2 are sent back to the user as the result of statement 1. The total estimated time is 0.72 seconds.  


To be able to use this in a 'real world' example you need to get at the application code, or you may have to set up your application to create/populate a volatile table instead of a derived table, but having done one of those you'll probably find that you get the desired explain plan and therefore performance.


Cheers,

Dave

Ward Analytics Ltd: Information in motion (www.ward-analytics.com)



     
  <Prev Next>   <<First <Prev Next> Last>>  
 
 
 
 
 
 
 
 
  
  Top Home Privacy Feedback  
 
 
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 15 Jun 2023