Archives of the TeradataForum

Message Posted: Fri, 21 Dec 2001 @ 14:23:32 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		Re: Performance optimization issue

From:		Sauli Harala

Most kind thanks for everyone who helped to troubleshoot this performance problem !

I have been busy to try the suggestions - so to answering here is slightly delayed.

Each of you pointed some piece of useful information - though something we had detected earlier by ourselves.

I'll try to combine some of the most adequate comments here:

* Teradata makes itself the decision to use the index or not - if you are not careful you spend time in table updates to keep up an index which is no used at all (one cannot give the query optimizer hints to use some index - though collecting of statistics may affect the optimizer strategy

* when using a index one must keep sure that the index condition is met in the subqueries "using IN, nested queries, or derived tables"

* indication of the proper index use is found by explain log entry "a ROW HASH MATCH SCAN across ALL-AMPS"

* if the index is not used the result of the analysis is the 'FULL TABLE SCAN' where the performance time grows when the size of the history table grows

* keeping up an index information is a time/space consuming issue. Sometimes Teradata is much better when you "manually" imitatate the index just building it from scratch.

* a keeping up join index might help, but you cannot multiload to a table which is a part of the join index - loading with 'tpump' or pure 'SQL' is OK but does not perform as well. Dropping and re-creating a join index with a big table takes time and space.

* :-) it is hard to simplify the problem properly for a discussion forum - when your Teradata "explain" gives '25' steps from your query (even without the update of the results) and the actual query is a join of six or more tables

We had already given up updating the secondary indexes - because we have not had much use for them.

After some trials and errors we ended up to the strategy, where the actual "purchase frequency analysis" is never made "directly" against the history table.

Instead:

1) There is a "one-shot" run to build the initial "customer's previous purchase" from the "purchase history" - it takes time, but that time is saved later

2) The purchase frequency is calulated by joining the "latest purchase" with the "customer's previous purchase".

3) When the "latest purchase" rows are inserted to the "purchase history" the "customer's previous purchase" table is dropped and recreated by merging the "customer's previous purchase" with the "latest purchase"

4) By following these steps the performance is not too fast yet (about 25 minutes in our two node system) for a bunch of almost 1.000.000 latest receipts - but it is tolerable now.

(We also tested by adding both the previous and latest purchase to the same table, but because its size was in average case much bigger than the pure "latest purchase", the self-join was slower in that case)

If you still have comments or improvement suggestions please let me know.

Wishing You a Merry Christmas !

---
Sauli Harala
EDS Oy


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Attachments

Library

Quick Reference