Archives of the TeradataForum
Message Posted: Sat, 19 May 2007 @ 23:01:10 GMT
<-- Anonymously Posted: Wednesday, May 16, 2007 10:44 -->
How accurate do sampled column stats have to be to keep the Optimizer well informed?
There seems to be a large gap in stats accuracy when column level stats are collected using sampling vs. 100% collection.
As the TD manuals predict, upping the CollectStatsSample parm improves the column level statistics but leaves a lot to be desired.
In one case, a table with 12M, evenly distributed (UPI) rows had a column with 70K distinct values. A standard collect stats shows this number.
Sampling with a default CollectStatsSample parm value (2%) showed about 1.2M distinct values.
Sampling with the parm at 50% showed about 150K distinct values (before I started this, I expected a 20-30% sampling to be as good as the standard walk-the-table approach.)
Sampling with the parm at 70% showed about 100K distinct values.
Note, when the attribute was created as an index, all levels of sampling returned 70K distinct values (which the manuals suggested would occur).
Research in the Forum showed several discussions on how to collect stats, triggering row counts to drive col-stat collection, but I haven't been able to find in the manuals or the Forum some recommendations for the initial question above.
So, any thoughts on how accurate the sampled column stats need to be to help create (even better: ensure) good plans?
|Copyright 2016 - All Rights Reserved|
|Last Modified: 28 Jun 2020|