Archives of the TeradataForum

Message Posted: Fri, 08 Oct 2004 @ 23:51:03 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		Hey, I've got a question: Sampling Statistics

From:		Frank C. Martinez IV

All hail, fellow Teradactyls,

We've been having some interesting (ouch) times with sampled statistics, mainly due to the fact that the sampling algorithm uses:

"scaling logic (that) assumes that the underlying data has an unbounded domain (i.e., #unique values grows proportionally with the size of the table). While this algorithm is accurate for data that is unique or nearly unique, it is not as accurate for data with bounded domains."

Of course, this will all be solved when we get to v2r5.1.2 (someday). This is causing us to be much more cautious in our use of sampled statistics, even for big tables, where it is most useful (for example, sampling date columns kinda sucks).

So I was wondering if any one of you have ever:

1) had the same problem;

2) changed the default percentage in the internal GDO parameters

3) have any other rules of thumb you use for sampled stats.

Thanks in advance!