Archives of the TeradataForum
Message Posted: Wed, 07 Jan 2004 @ 17:01:36 GMT
Statistics can now be sample-based so the formulae get more and more complicated but the extreme case (when the sample coincides with the entire population, like in R4) seems to be easy to grasp, at least in great lines.
First the frequency distribution is calculated. Based on it TD knows the number of loners = values whose frequency is greater than or equal to 1/200 x (base table cardinality). The allocation of intervals starts with the loners and high-biased intervals. By design there can be at most two values in a high-biased interval (1 or 2 loners), and the total number of high-biased intervals is 99.
When done with loners, the database builds the remaining equal-height intervals. Their number can be anything from 1 to (100 - the number of high-biased intervals).
I don't think all the practical details of this algorithm have been published. For example, how the database exactly decides whether to store one or two loners per interval etc. Perhaps somebody from NCR would comment on this, if they are inclined so.
|Copyright 2016 - All Rights Reserved|
|Last Modified: 27 Dec 2016|