
Archives of the TeradataForumMessage Posted: Wed, 07 Jan 2004 @ 17:01:36 GMT
Statistics can now be samplebased so the formulae get more and more complicated but the extreme case (when the sample coincides with the entire population, like in R4) seems to be easy to grasp, at least in great lines. First the frequency distribution is calculated. Based on it TD knows the number of loners = values whose frequency is greater than or equal to 1/200 x (base table cardinality). The allocation of intervals starts with the loners and highbiased intervals. By design there can be at most two values in a highbiased interval (1 or 2 loners), and the total number of highbiased intervals is 99. When done with loners, the database builds the remaining equalheight intervals. Their number can be anything from 1 to (100  the number of highbiased intervals). I don't think all the practical details of this algorithm have been published. For example, how the database exactly decides whether to store one or two loners per interval etc. Perhaps somebody from NCR would comment on this, if they are inclined so. Regards, Victor
 
 
Copyright 2016  All Rights Reserved  
Last Modified: 23 Jun 2019  