Archives of the TeradataForum
Message Posted: Thu, 29 Jan 2004 @ 17:16:57 GMT
I am in Dieter's camp because he has made a valid point, in my opinion. I can't comment on James's one-liner as there is no context there. However, I think I understand where you are coming from. The main point of misunderstanding here (as I see it) is the standard terminology as used in TD manuals.
The words "data skew" in TD seem to stick to the PI domain only, although the term comes from mathematical statistics, where it applies to *any* type of data distribution. When TD manuals discuss statistics they use other terms like "bias". It is probably a good idea to use synonyms in different contexts; some books on statistics might be doing the same (there are sometimes hundreds of years of tradition behind some terms). However, to me these terms describe pretty much the same properties of a data distribution. When I see how TD defines and stores loner values while collecting statistics on columns (not necessarily in PI) I can't help but thinking that this is exactly what the "skew" is all about, even if it is called "high-biased intervals". It is the column(s) distribution skew.
Of course, there are differences in numeric implementations. PI algorithm is probably more sophisticated than stats with their fixed 100 intervals and the corresponding simple definition of loner values etc. Such differences, however, do not change the fact that the underlying ideas are essentially the same.
When you say "We record no distribution info in stats", those of us with mathematical background and not necessarily TD manual minded are in knockdown. When we recover, we understand that you are probably talking about the *PI distribution*. That one can indeed be very far from statistics (you described the cases when they are close) but that was absolutely not what Dieter meant in the first place, if I read his posting correctly.
Hope these ramblings will help both "camps" to understand each other.
|Copyright 2016 - All Rights Reserved|
|Last Modified: 28 Jun 2020|