Archives of the TeradataForum

Message Posted: Fri, 08 Aug 2003 @ 18:57:07 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		Re: Need help in understanding poor performance

From:		Dieter Noeth

Anomy Anom wrote:

The following message is the explaination of sql statement for Dieter.

> Explanation
>   5) We execute the following steps in parallel.
>        1) We do an all-AMPs RETRIEVE step from
>           PMART.TXN_SHOPPING_TRANSACTION_2003 by way of an all-rows
>           scan with a condition of ("(EXTRACT(YEAR FROM
>           (PMART.TXN_SHOPPING_TRANSACTION_2003.TransactionProcessDT
>           )))= 2003") into Spool 2, which is redistributed by hash code
>           to all AMPs.  Then we do a SORT to order Spool 2 by row hash
>           and the sort key in spool field1 eliminating duplicate rows.
>           The input table will not be cached in memory, but it is
>           eligible for synchronized scanning.  The result spool file
>           will not be cached in memory.  The size of Spool 2 is
>           estimated with no confidence to be 3,401,551 rows.  The
>           estimated time for this step is 59 minutes and 4 seconds.

Spool is not cached, it's more IOs to sort it and you'll have to IO it in the join step 6. The Optimizer doesn't use statistics for extract, so it assumes 10% of the table size (~34,015,510 rows?). If you use BETWEEN DATE '2003-01-01' AND DATE '2003-12-31' instead of EXTRACT it will help the optimzer, but the plan will probably still be the same :-(

>        2) We do an all-AMPs RETRIEVE step from
>           PDATA.TXN_SHOPPING_TRAN_C by way of an all-rows scan with a
>           condition of ("(EXTRACT(YEAR FROM
>           (PDATA.TXN_SHOPPING_TRAN_C.TransactionProcessDT )))= 2003")
>           into Spool 3, which is redistributed by hash code to all AMPs.
>           Then we do a SORT to order Spool 3 by row hash and the sort
>           key in spool field1 eliminating duplicate rows.  The input
>           table will not be cached in memory, but it is eligible for
>           synchronized scanning.  The size of Spool 3 is estimated with
>           no confidence to be 138,172 rows.  The estimated time for
>           this step is 2 minutes and 24 seconds.

Table size ~1,381,720 rows?

>   6) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an
>      all-rows scan, which is joined to Spool 3 (Last Use).  Spool 2 and
>      Spool 3 are joined using an exclusion merge join, with a join
>      condition of ("Field_1 = Field_1").  The result goes into Spool 1,
>      which is built locally on the AMPs.  The size of Spool 1 is
>      estimated to be 3,332,465 rows.  The estimated time for this step
>      is 2 minutes and 46 seconds.
>   7) We do an all-AMPs RETRIEVE step from Spool 1 (Last Use) by way of
>      an all-rows scan into Spool 4, which is redistributed by hash code
>      to all AMPs.  Then we do a SORT to order Spool 4 by row hash.  The
>      result spool file will not be cached in memory.  The size of Spool
>      4 is estimated with no confidence to be 3,332,465 rows.  The
>      estimated time for this step is 4 hours and 16 minutes.

Again spool is not cached.

>   8) We do a MERGE into PMART.TEMP_TXN_SHOPPING_TRAN from Spool 4 (Last
>      Use).
>   9) We spoil the parser's dictionary cache for the table.
>  10) Finally, we send out an END TRANSACTION step to all AMPs involved
>      in processing the request.
>   -> No rows are returned to the user as the result of statement 1.

  *** Insert completed. 32689172 rows added.
  *** Total elapsed time was 4 hours, 21 minutes and 52 seconds.

This is faster than expected according to explain ;-)

Is the PI of both source tables the same? You should try to get rid of the preparation steps spooling the source tables. Try to rewrite your query using NOT EXISTS, looks ugly but probably results in an AMP-local join step:

SEL * FROM PMART.TXN_SHOPPING_TRANSACTION_2003 AS T1
WHERE EXTRACT(YEAR FROM TRANSACTIONPROCESSDT ) = 2003
AND NOT EXISTS
(SEL * FROM PDATA.TXN_SHOPPING_TRAN_C AS T2
  WHERE T1.column1 = T2.colum1
  AND T1.colum2 = T2.colum2
  AND T1.colum3 = T3.colum2
  AND ....
  AND EXTRACT(YEAR FROM TRANSACTIONPROCESSDT ) = 2003);

And maybe try BETWEEN DATE '2003-01-01' AND DATE '2003-12-31' also...

Dieter


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Attachments

Library

Quick Reference