Archives of the TeradataForum

Message Posted: Thu, 03 Aug 2006 @ 12:55:32 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		Re: Recommendations from Statistics Wizard

From:		Michael Larkins

Martin:

As the SAMPLE designation implies, it only looks at a portion of the data where as a normal collection looks at the actual values of all rows. You should only use SAMPLE if the data is evenly distributed across your AMPs and you have a lot of it (100's of million or billions of rows) to speed up the collection.

What it is going to do is read some of the data (again, SAMPLE) and then extrapolate what it finds to equate to all the data and all the AMPs. In other words, it mathematically calculates what it thinks would be a normal distribution of values based on what it saw in less than all the rows. It is like not having statistics on a small file and looking at an EXPLAIN. It thinks a file with 12 rows in it has 100 rows because that is the number of AMPs. If it found 1 row on 1 AMP, the assumption is that each AMP has 1 row and of course, 1*100=100 - not 12.

Do not expect USING SAMPLE to be accurate or like now, you will be disappointed. Expect it to be a guess, but a faster approach to guessing statistics for evenly distributed rows of a table. I believe you will find a similar explanation if you were to read the manual.

Regards,

Michael Larkins
Certified Teradata Master
Certified Teradata SQL Instructor


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Attachments

Library

Quick Reference