Archives of the TeradataForum
Message Posted: Thu, 24 May 2007 @ 13:31:59 GMT
When you specify USING SAMPLE on a non-indexed column that is not almost-unique, you will experience what you are describing below. The number of distinct values will end up being extrapolated upwards, depending on the percentage of rows sampled. If 2% of the rows were sampled, then you could expect to see about 50 times more distinct values in the produced estimate; If 50% were sampled the estimate will be near double the actual number of distinct values (in your case 150k estimate with 70k distinct values).
For that reason, USING SAMPLE works best with columns or indexes that are higly unique, whose number of distinct values apporahces the row count of the table.
However, you may find your plans are acceptable even if the produced estimates are off. There is no rule of thumb for what will be acceptable at your site. It will depend on demographics, and you would need to examine explains of frequently executed queries to determine that.
Even just using USING SAMPLE on UPI columns or near-unique non-indexed columns could be a noteable savings of time and resources on large tables.
|Copyright 2016 - All Rights Reserved|
|Last Modified: 28 Jun 2020|