Archives of the TeradataForum

Message Posted: Fri, 10 Jun 2005 @ 20:46:51 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		Re: statistics AFTER reconfig ?

From:		Victor Sokovin

I should probably materialize my doubts as follows.

Data skew can influence the histogram type chosen by Teradata to store statistics. For me "different histogram types" means "different statistics".

Now, what is data skew in the context of collecting statistics? To answer this let's go to the "Statement and Transaction Processing" manual, one of the few manuals shading some light on the topic:

"Skew: A measure of the asymmetry of the distribution of a set of attribute values.

Skewness is the third moment of the probability density function for a population of attribute values. The first two moments are the mean and the standard deviation, respectively.

With respect to skew in parallel databases, there are several possible types.

- Attribute value skew refers to skew that is inherent in the data. An examplemight be a column that can take on only two values.

- Partition skew refers to skew that results from an uneven distribution of data across the AMPs.

The difference is apparent from the context. As used in this manual, the term usually refers to the partition skew that occurs when the primary index for a table is defined on a column set that is highly nonunique."

I must admit that the difference between the two is not always obvious to me so I have to assume that when the manual talks about such things as data skew and statistical compensation for skewness (leading to larger samples, different histogram types, and, therefore, to different statistics) then it talks about the partition skew. The latter might very well depend on the number of AMPs.

Is this enough ground for my doubts about Carrie's statement?

Regards,

Victor


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Attachments

Library

Quick Reference