Archives of the TeradataForum

Message Posted: Fri, 31 Jan 2003 @ 20:04:21 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		Re: Query optimization: difference between inner join and derived table?

From:		Dieter N�th

John Grace wrote:

It seems a Filter or Constraint in the Where Clause will be applied when the table in first accessed. So if I ask for one row from the small table, one row is retrieved. (Very fast.) The Large Table is joined to the one row table in memory. (Again very fast.)

SELECT ...
FROM table1
INNER JOIN table2
ON table1.rid=table2.rid
Where table2.rid = "RED"

If I move the Constraint to the From clause, the entire small table is read (maybe slow) and the constraint is applied DURING the join to large table. (maybe slow).

SELECT ...
FROM table1
INNER JOIN table2
ON table1.rid=table2.rid and table2.rid = "RED"

In case of Inner Joins i would call this a bad optimizer ;-)

To make this example a little more interesting, try coupling these queries with outer joins. The timing of the filter and Outer join can have performance implications and produce different results. For example,

SELECT ...
FROM table1
Left outer JOIN table2
ON table1.rid=table2.rid and table2.rid = "RED"
(all rows from the table1)

The constraint on RED "overrides?" the outer. I get what I would expect, just the RED items.

No, you get _all_ rows from table1, but all non-RED data from table2 is NULLed.

SELECT ...
FROM table1
Left outer JOIN table2
ON table1.rid=table2.rid
Where table2.rid = "RED"

The constraint is applied first, then the left outer is performed. I get ALL rows from table1 joined to the subset of "RED" items from table2!

No, it's an Outer join, so first there's the Outer join and then there's the WHERE. And because the constraint is on the inner table, all the NULLed data will be filtered, so the result is the same as an Inner Join.

We have seen some optimizers turn off the outer and perform an inner when a constraint is detected in this manner.

This time it's a smart optimizer, because there's no need to Outer Join and then remove all NULLed rows again.

It matters what side on an outer the constraint is on.

Now, try nesting the outers and add some constraints.

That's why nobody likes complex Outer Joins, sometimes it's really hard to tell, if the result set is the expected one.

SELECT ...
FROM table1 Left Outer JOIN
( table2 Left Outer JOIN
( table3 Left Outer JOIN table4
ON table3.rid=table4.rid
)
ON table2.rid=table3.rid
)
ON table1.rid=table2.rid
Where table3.rid = "Chevy"

All the Left outers, Left of the constraint should switch to inners. I want only "Chevy".

We have found that explicitly switching all LEFT outers, LEFT of the constraint to inners is a safer approach along with a big performance booster. (just the opposite with Right outers)

Never write an Outer Join unless you really need it ;-)

We have also found that it makes sense, for performance reasons, to split up constraints on some tables, placing some constraints in the Where and some in the From. (Hit a specific index in the first pass.)

If it's an Inner Join no different plan should be created by the optimizer.

As I said earlier, this is a hodge-podge of issues we had getting one partner's 3NF, ERP apps working on multiple RDBMS's. Teradata is pretty good with these issues and some others are just now supporting SQL92.

And even with Teradata there's sometimes a difference between old-sytle SQL89 "from table1, table2 where table1.col = table2.col" and new-sytle "from table1 inner join table2 on table1.col = table2.col". If it's a big difference it should be treated as a bug. Btw, in most cases the SQL89-syntax produced the better plan ;-)

Dieter


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Attachments

Library

Quick Reference