Archives of the TeradataForum

Message Posted: Fri, 26 Nov 2010 @ 21:04:45 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		Re: Sampled statistics in a production environment

From:		Ballinger, Carrie

Prior to Teradata 13.0, I agree with what YB has already said: Sampled stats are useful almost exclusively for stats that are nearly unique (95% of the values are unique). Certainly you can use sampled stats for all your unique columns.

However, in Teradata 13.0, the algorithm used for sampling when you specify USING SAMPLE has been redone, and it is now quite a bit more accurate for non-unique columns. If you are on T 13.0 now, or when you get to T13.0, suggest you try it out and see if the feature might be more useful for you.

Where T13.0 sampled stats is still limited is with stats that have skew. So it's good for non-unique stats with more or less even distribution, but use full stats collection with skewed data.

One thing to remember with sampled stats is that the default is 2%, which is pretty low. The limitations of USING SAMPLE, either in T 12 or T13.0, can be somewhat counter-balanaced by increasing the sampling percentage of the collection (20%, 50%, etc.). But you have to try it yourself and compare the number of distinct values being calculated with sampling vs. with full stats collections.

Thanks, -Carrie


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Attachments

Library

Quick Reference