Archives of the TeradataForum

Message Posted: Wed, 07 Jan 2004 @ 17:01:36 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		Re: Intervals for Statistics

From:		Victor Sokovin

Statistics can now be sample-based so the formulae get more and more complicated but the extreme case (when the sample coincides with the entire population, like in R4) seems to be easy to grasp, at least in great lines.

First the frequency distribution is calculated. Based on it TD knows the number of loners = values whose frequency is greater than or equal to 1/200 x (base table cardinality). The allocation of intervals starts with the loners and high-biased intervals. By design there can be at most two values in a high-biased interval (1 or 2 loners), and the total number of high-biased intervals is 99.

When done with loners, the database builds the remaining equal-height intervals. Their number can be anything from 1 to (100 - the number of high-biased intervals).

I don't think all the practical details of this algorithm have been published. For example, how the database exactly decides whether to store one or two loners per interval etc. Perhaps somebody from NCR would comment on this, if they are inclined so.

Regards,

Victor


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Attachments

Library

Quick Reference