Archives of the TeradataForum

Message Posted: Tue, 28 Jan 2014 @ 10:49:54 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		Inefficient LEFT JOIN execution plan

From:		Anomy Anom

<-- Anonymously Posted: Tuesday, January 28, 2014 05:25 -->

Hi all

sorry for the lengthy post, thanks in advance for any insight or assistance and please tolerate the spelling of all the words with the English S rather than the American Z .

I am on a Teradata 14.0 system.

I have 3 tables, small_table (approx 10 rows per amp), big_table (approx 300,000 rows per amp) and very_big_table (approx 3,000,000 rows per amp)

All 3 tables have the same primary index (with no PPI defined). All 3 tables have full statistics collected.

I also have a view filtered_view which is a select of all data from very_big_table with an inner join on the PI columns to big_table. The data in big_table is a subset of the data in very_big_table.

Something like:

     Create view filtered_view
     as
     sel t1.*
     from very_big_table t1
     inner join big_table t2
     on
     t1.picol=t2.picol;

If I EXPLAIN an INNER JOIN query of the ilk

     sel * from small_table t1 inner join filtered_view t2 on t1.picol=t2.picol;

I can see that the optimiser generates a plan where small_table is merge joined to big_table with the result written to spool and with an estimated number of rows equal to the number of rows on the small table , then in the subsequent step this small spool file is joined to very_big_table again using a merge join with an estimated number of rows to be similar to the small table.

This is great and exactly the plan I wanted as the minimum number of rows are manipulated in each step.

If I EXPLAIN a LEFT OUTER JOIN query of the ilk

     sel * from small_table t1 LEFT join filtered_view t2 on t1.picol=t2.picol;

I can see that the optimiser generates a plan where the big_table is merge joined with the very_big_table with the result written to spool and with an estimated number of rows similar to the number of rows on the big_table, then in the subsequent step the small table is left joined to this large spool file using a merge join with an estimated number of rows similar to the number of rows on the small table.

This is not the plan I was hoping for as there has been a significant amount of processing to build the very large spool file prior to utilising the small table.

So, my questions are

Is the execution plan generated for the LEFT JOIN query the one you would expect ?

Is there a logical reason why the plan should not use the small table to build the initial spool file (E.g. using inner join logic) prior to performing the LEFT join in a subsequent step, thereby vastly reducing the amount of CPU consumed. i.e. is there a scenario where different results may be returned and therefore the optimiser has no choice but to pre resolve the entire filtered_view in spool before using it in the LEFT join with the small table.

Many thanks for any opinions. Please note, I know I can code the query to perform the joins as I wish, I want to understand why the optimiser does not generate the plan which I desire automatically.


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Attachments

Library

Quick Reference