Archives of the TeradataForum

Message Posted: Fri, 16 Feb 2001 @ 19:16:37 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		Re: Collect Stat

From:		Michael McIntire

The general rules of thumb that I use for collecting stats are as follows:

Collect stats on:

1) Primary Indexes

2) Any column used in a join or constraint clause.

�pdate stats:

1) When the distribution or cardinality of a column/index changes more than 2-3%.

2) When a substantial amount of the updates occur on a small number of values, effecting distribution of rows/value.

3) When the new or updated values fall outside the min or max value of the table.

4) When the maximum refresh time has been reached

When designing ETL systems, I try to lump candidate columns into appriopriate sets based on frequencies: Daily, Weekly, Bi-weekly, weekly rotation (to distribute collection of monthly stats over the maintainence window) and finally at the Lifespan value.

Update Case 4 exists only because I believe all stats should have a finite lifespan.

Update Case 3 is typically found when inserting rows with dates which are subsequently used in query constraints. Subjective experience indiciates that the optimizer's tolerance for values outside the min/max is good for one or two incrementially larger/smaller values. (as an aside: I am uncertain of the impact in the case of value ordered NUSI, which is likely to be more important)

This leads to a situation where a historized database can cause a substantive number of collect stats after every batch cycle. An a clear indicator of the problem can be found in correlated subqueries to find the "most recient" row for a given set of tuples deliniated by by the date/time stamp.

So: Care should be used when designing historical tables that the collect statistics overhead does not outweigh the usefullness of using dates to find the most recient record in a set...


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Attachments

Library

Quick Reference