Archives of the TeradataForum

Message Posted: Sat, 11 Jun 2005 @ 06:52:23 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		Re: statistics AFTER reconfig ?

From:		Dieter Noeth

Victor Sokovin wrote:

I should probably materialize my doubts as follows.

Data skew can influence the histogram type chosen by Teradata to store statistics. For me "different histogram types" means "different statistics".

Ack

Now, what is data skew in the context of collecting statistics? To answer this let's go to the "Statement and Transaction Processing" manual, one of the few manuals shading some light on the topic:

"Skew: A measure of the asymmetry of the distribution of a set of attribute values.

Skewness is the third moment of the probability density function for a population of attribute values. The first two moments are the mean and the standard deviation, respectively.

With respect to skew in parallel databases, there are several possible types.

- Attribute value skew refers to skew that is inherent in the data. An examplemight be a column that can take on only two values.

Only that kind of information is stored within statistics.

Would you expect different result sets for different cofigurations when you submit a "select col, count(*) from table group by 1;"?

This is the first step of a Collect Stats, then the result set is copped into up to 100 intervals.

If it really was configuartion dependend then the System Emulation Tool would be useless :-)

- Partition skew refers to skew that results from an uneven distribution of data across the AMPs.

The difference is apparent from the context. As used in this manual, the term usually refers to the partition skew that occurs when the primary index for a table is defined on a column set that is highly nonunique."

I must admit that the difference between the two is not always obvious to me so I have to assume that when the manual talks about such things as data skew and statistical compensation for skewness (leading to larger samples, different histogram types, and, therefore, to different statistics) then it talks about the partition skew. The latter might very well depend on the number of AMPs.

Is this enough ground for my doubts about Carrie's statement?

No ;-)

But maybe there could be a minor difference for Sample Stats...

Dieter


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Attachments

Library

Quick Reference