|
|
Archives of the TeradataForum
Message Posted: Fri, 26 Nov 2010 @ 21:04:45 GMT
Subj: | | Re: Sampled statistics in a production environment |
|
From: | | Ballinger, Carrie |
Prior to Teradata 13.0, I agree with what YB has already said: Sampled stats are useful almost exclusively for stats that are nearly unique
(95% of the values are unique). Certainly you can use sampled stats for all your unique columns.
However, in Teradata 13.0, the algorithm used for sampling when you specify USING SAMPLE has been redone, and it is now quite a bit more
accurate for non-unique columns. If you are on T 13.0 now, or when you get to T13.0, suggest you try it out and see if the feature might be more
useful for you.
Where T13.0 sampled stats is still limited is with stats that have skew. So it's good for non-unique stats with more or less even
distribution, but use full stats collection with skewed data.
One thing to remember with sampled stats is that the default is 2%, which is pretty low. The limitations of USING SAMPLE, either in T 12 or
T13.0, can be somewhat counter-balanaced by increasing the sampling percentage of the collection (20%, 50%, etc.). But you have to try it
yourself and compare the number of distinct values being calculated with sampling vs. with full stats collections.
Thanks, -Carrie
| |