Archives of the TeradataForum
Message Posted: Fri, 16 Feb 2001 @ 19:16:37 GMT
The general rules of thumb that I use for collecting stats are as follows:
Collect stats on:
1) Primary Indexes
2) Any column used in a join or constraint clause.
1) When the distribution or cardinality of a column/index changes more than 2-3%.
2) When a substantial amount of the updates occur on a small number of values, effecting distribution of rows/value.
3) When the new or updated values fall outside the min or max value of the table.
4) When the maximum refresh time has been reached
When designing ETL systems, I try to lump candidate columns into appriopriate sets based on frequencies: Daily, Weekly, Bi-weekly, weekly rotation (to distribute collection of monthly stats over the maintainence window) and finally at the Lifespan value.
Update Case 4 exists only because I believe all stats should have a finite lifespan.
Update Case 3 is typically found when inserting rows with dates which are subsequently used in query constraints. Subjective experience indiciates that the optimizer's tolerance for values outside the min/max is good for one or two incrementially larger/smaller values. (as an aside: I am uncertain of the impact in the case of value ordered NUSI, which is likely to be more important)
This leads to a situation where a historized database can cause a substantive number of collect stats after every batch cycle. An a clear indicator of the problem can be found in correlated subqueries to find the "most recient" row for a given set of tuples deliniated by by the date/time stamp.
So: Care should be used when designing historical tables that the collect statistics overhead does not outweigh the usefullness of using dates to find the most recient record in a set...
|Copyright 2016 - All Rights Reserved|
|Last Modified: 28 Jun 2020|