Archives of the TeradataForum

Message Posted: Wed, 12 Mar 2003 @ 15:39:29 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		Re: Collect stats on PI

From:		Kohut, Eric J

Well the Teradata Optimizer is a Cost Based Optimizer and as a result, it needs statistics to tell it the cost of the various requested tables to evaluate different access methods for fulfilling a user requests.

This is especially true for the PI of smaller to medium tables or tables with very uneven distribution.

The smaller the table, the more important, but due to reason #2 that follows, medium and larger tables especially with NUPI's should also have collected statistics if time permits. Generally you only need to recollect on medium and large tables as 10 % of the table has changed which is not very often.

The first reason for this is that the RDBMS will often (possibly not always) try to use a Random Amp sample of data to guess how big a table is and therefor the cost of access the data. On smaller tables there is a large chance that the random amp sample contains a bigger variance from the normal distribution of data. I.e. a table with 100 rows on a 200 amp machine would likely have 50 % of the amps with no rows on them.

The second reason is that the Optimizer uses some basic assumptions about the cardinality of the data without statistics and can take much more aggressive approach to the data if the statistics are available.

This statement is also very true of all secondary indexes as well, if you desire that the Optimizer use them.

I'm sure that this is much more to be said on this topic, but time limits that being said at this time.

God Luck,

Eric

EJK
Eric J. Kohut
Senior Solutions Consultant - Teradata Solutions Group - Retail
Certified Teradata Master
NCR Corp.


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Attachments

Library

Quick Reference