Archives of the TeradataForum

Message Posted: Thu, 18 Jul 2002 @ 14:57:40 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		Re: The Optimizer and Multi Inlists

From:		David Wellman

Why not try building your list of values in a derived table and then joining the derived table to the table with the multi-part PI. I tried the following on a table (vedcuser04) which has two columns (col01 and col03) defined as the PI. My requiredment is to find rows where col01 in ('david','fred') and col04 = 1000102.

I coded it as:

sel *
from vedcuser04
where (col01,col03)
in (select 'david' (char(30) ,character set latin ),1000102 (date)
     from sys_calendar.calendar
     where  calendar_date=date
     union all
    select 'fred' (char(30) ,character set latin ),1000102 (date)
     from sys_calendar.calendar
     where  calendar_date=date);

The explain shows that the table with the multi-part PI (vedcuser04) is accessed last and is accessed using a merge join on the PI columns (Step 7). Note that table vedcuser04 is only @4000 rows on a 20-amp system so even with a 'small' table you get the right join plan:

`Explanation --------------------------------------------------`
	`1)`	`First, we lock a distinct VEDB4NODE."pseudo table" for read on a RowHash to prevent global deadlock for VEDB4NODE.vedcuser04.`
	`2)`	`Next, we lock VEDB4NODE.vedcuser04 for read.`
	`3)`	`We do a single-AMP RETRIEVE step from SYS_CALENDAR.CALDATES by way of the unique primary index "SYS_CALENDAR.CALDATES.cdate = DATE" with no residual conditions into Spool 1, which is built locally on that AMP. The size of Spool 1 is estimated with high confidence to be 1 row. The estimated time for this step is 0.15 seconds.`
	`4)`	`We do a single-AMP RETRIEVE step from SYS_CALENDAR.CALDATES by way of the unique primary index "SYS_CALENDAR.CALDATES.cdate = DATE" with no residual conditions into Spool 1, which is built locally on that AMP. The size of Spool 1 is estimated with high confidence to be 2 rows. The estimated time for this step is 0.15 seconds.`
	`5)`	`We do an all-AMPs RETRIEVE step from Spool 1 (Last Use) by way of an all-rows scan into Spool 3, which is redistributed by hash code to all AMPs. The size of Spool 3 is estimated with high confidence to be 2 rows. The estimated time for this step is 0.17 seconds.`
	`6)`	`We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of an all-rows scan into Spool 4, which is built locally on the AMPs. Then we do a SORT to order Spool 4 by row hash. The size of Spool 4 is estimated with high confidence to be 1 row. The estimated time for this step is 0.04 seconds.`
	`7)`	We do an all-AMPs JOIN step from VEDB4NODE.vedcuser04 by way of an all-rows scan with no residual conditions, which is joined to Spool 4 (Last Use). VEDB4NODE.vedcuser04 and Spool 4 are joined using an inclusion merge join, with a join condition of ("(VEDB4NODE.vedcuser04.COL01 =) AND (VEDB4NODE.vedcuser04.COL03=)"). The result goes into Spool 2, which is built locally on the AMPs. The size of Spool 2 is estimated with low confidence to be 2 rows. The estimated time for this step is 0.20 seconds.
	`8)`	`Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.`
	`->`	`The contents of Spool 2 are sent back to the user as the result of statement 1. The total estimated time is 0.72 seconds.`

To be able to use this in a 'real world' example you need to get at the application code, or you may have to set up your application to create/populate a volatile table instead of a derived table, but having done one of those you'll probably find that you get the desired explain plan and therefore performance.

Cheers,

Dave

Ward Analytics Ltd: Information in motion (www.ward-analytics.com)


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Attachments

Library

Quick Reference