Re: Left join problem [From Burton, Bruce: Wed, 02 Oct 2002 @ 17:52 GMT]

Message Posted: Wed, 02 Oct 2002 @ 17:52:15 GMT

Subj:

Re: Left join problem

From:

Burton, Bruce

It looks like Teradata uses a hash join with the 'bad' query but a merge join with the 'good' query. Not sure why Teradata decided to use the different methods for the queries...I assume something with the indexing but not sure. Does the 'good' query using sub-selects strip off index info?

Here are the explains from the two queries:

Query that works correctly:
---------------------------

Explanation --------------------------------------------------

1)

First, we lock a distinct mktg_temp."pseudo table" for read on a RowHash to prevent global deadlock for mktg_temp.p065_snet_input.

2)

Next, we lock mktg_temp.p065_snet_input for read, and we lock CMDMDB.Static for access.

3)

We execute the following steps in parallel.

1) We do an all-AMPs RETRIEVE step from mktg_temp.p065_snet_input by way of an all-rows scan with no residual conditions into Spool 2, which is built locally on the AMPs. The size of Spool 2 is estimated with low confidence to be 236,880 rows. The estimated time for this step is 0.22 seconds.

2) We do an all-AMPs RETRIEVE step from CMDMDB.Static by way of an all-rows scan with no residual conditions into Spool 1, which is built locally on the AMPs. The size of Spool 1 is estimated with high confidence to be 26,605,513 rows. The estimated time for this step is 51.89 seconds.

4)

We do an all-AMPs RETRIEVE step from Spool 1 (Last Use) by way of an all-rows scan into Spool 4, which is redistributed by hash code to all AMPs. Then we do a SORT to order Spool 4 by row hash. The size of Spool 4 is estimated with high confidence to be 26,605,513 rows. The estimated time for this step is 1 minute and 27 seconds.

5)

We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of a RowHash match scan., which is joined to Spool 4 (Last Use). Spool 2 and Spool 4 are left outer joined using a merge join, with a join condition of ("ACCT2 = ACCT"). The result goes into Spool 3, which is built locally on the AMPs. The size of Spool 3 is estimated with no confidence to be 73,019,005 to 6,302,313,919,440 rows. The estimated time for this step is 40.78 seconds to 704 hours and 11 minutes.

6)

Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.

->

The contents of Spool 3 are sent back to the user as the result of statement 1. The total estimated time is 2 minutes and 59 seconds.

Query that *does not* work correctly:
-------------------------------------

Explanation --------------------------------------------------

1)

First, we lock a distinct mktg_temp."pseudo table" for read on a RowHash to prevent global deadlock for mktg_temp.t1.

2)

Next, we lock mktg_temp.t1 for read, and we lock CMDMDB.Static for access.

3)

We do an all-AMPs RETRIEVE step from mktg_temp.t1 by way of an all-rows scan with no residual conditions into Spool 2 fanned out into 6 hash join partitions, which is redistributed by hash code to all AMPs. Then we do a SORT to order Spool 2 by row hash. The size of Spool 2 is estimated with no confidence to be 236,880 rows.

4)

We do an all-AMPs RETRIEVE step from Spool 2 by way of an all-rows scan into Spool 3 fanned out into 6 hash join partitions, which is duplicated on all AMPs. The size of Spool 3 is estimated with low confidence to be 66,326,400 rows. The estimated time for this step is 1 minute and 16 seconds.

5)

We do an all-AMPs RETRIEVE step from CMDMDB.Static by way of an all-rows scan with no residual conditions into Spool 4 fanned out into 6 hash join partitions, which is built locally on the AMPs. The size of Spool 4 is estimated with high confidence to be 26,605,513 rows. The estimated time for this step is 1 minute and 40 seconds.

6)

We do an all-AMPs JOIN step from Spool 3 (Last Use) by way of an all-rows scan, which is joined to Spool 4 (Last Use). Spool 3 and Spool 4 are joined using a hash join of 6 partitions, with a join condition of ("acct = acct"). The result goes into Spool 5, which is redistributed by hash code to all AMPs. Then we do a SORT to order Spool 5 by row hash. The size of Spool 5 is estimated with low confidence to be 316,810 rows. The estimated time for this step is 1 minute and 39 seconds.

7)

We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of a RowHash match scan., which is joined to Spool 5 (Last Use). Spool 2 and Spool 5 are left outer joined using a merge join of 6 partitions, with a join condition of ("Field_1 = Field_1"). The result goes into Spool 1, which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 316,810 rows. The estimated time for this step is 0.16 seconds.

8)

Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.

->

The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 4 minutes and 35 seconds.

Attachments

Library

Quick Reference

Archives of the TeradataForum

Message Posted: Wed, 02 Oct 2002 @ 17:52:15 GMT