
Archives of the TeradataForumMessage Posted: Thu, 29 Jan 2004 @ 17:16:57 GMT
Jon,
I am in Dieter's camp because he has made a valid point, in my opinion. I can't comment on James's oneliner as there is no context there. However, I think I understand where you are coming from. The main point of misunderstanding here (as I see it) is the standard terminology as used in TD manuals. The words "data skew" in TD seem to stick to the PI domain only, although the term comes from mathematical statistics, where it applies to *any* type of data distribution. When TD manuals discuss statistics they use other terms like "bias". It is probably a good idea to use synonyms in different contexts; some books on statistics might be doing the same (there are sometimes hundreds of years of tradition behind some terms). However, to me these terms describe pretty much the same properties of a data distribution. When I see how TD defines and stores loner values while collecting statistics on columns (not necessarily in PI) I can't help but thinking that this is exactly what the "skew" is all about, even if it is called "highbiased intervals". It is the column(s) distribution skew. Of course, there are differences in numeric implementations. PI algorithm is probably more sophisticated than stats with their fixed 100 intervals and the corresponding simple definition of loner values etc. Such differences, however, do not change the fact that the underlying ideas are essentially the same. When you say "We record no distribution info in stats", those of us with mathematical background and not necessarily TD manual minded are in knockdown. When we recover, we understand that you are probably talking about the *PI distribution*. That one can indeed be very far from statistics (you described the cases when they are close) but that was absolutely not what Dieter meant in the first place, if I read his posting correctly. Hope these ramblings will help both "camps" to understand each other. Regards, Victor
 
 
Copyright 2016  All Rights Reserved  
Last Modified: 28 Jun 2020  