|
|
Archives of the TeradataForum
Message Posted: Thu, 24 May 2007 @ 13:31:59 GMT
Subj: | | Re: Minimal Stats Sampling and a well-informed |
|
From: | | Ballinger, Carrie |
When you specify USING SAMPLE on a non-indexed column that is not almost-unique, you will experience what you are describing below. The
number of distinct values will end up being extrapolated upwards, depending on the percentage of rows sampled. If 2% of the rows were sampled,
then you could expect to see about 50 times more distinct values in the produced estimate; If 50% were sampled the estimate will be near double
the actual number of distinct values (in your case 150k estimate with 70k distinct values).
For that reason, USING SAMPLE works best with columns or indexes that are higly unique, whose number of distinct values apporahces the row
count of the table.
However, you may find your plans are acceptable even if the produced estimates are off. There is no rule of thumb for what will be acceptable
at your site. It will depend on demographics, and you would need to examine explains of frequently executed queries to determine that.
Even just using USING SAMPLE on UPI columns or near-unique non-indexed columns could be a noteable savings of time and resources on large
tables.
Thanks, -Carrie
| |