Archives of the TeradataForum

Message Posted: Fri, 19 Oct 2001 @ 15:15:35 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		Re: Insert select with sample

From:		Geoffrey Rommel

I've attached the piece of the explain that indicates that it'll read the whole table into spool. Does this make sense? I would have expected it to only load into spool the sample amount, but it SAMPLES spool, not the table.

And later:

the sample 500000 still reads all the rows into spool, at least that's what the explain indicates.

I don't know this for a fact, but I strongly suspect that Teradata is using one of the following two algorithms for sampling:

1. The Fan-Muller-Rezucha algorithm, described in Donald Knuth, Seminumerical Algorithms, algorithm 3.4.2S. This algorithm unfortunately requires the exact number of input rows to be known, so it would require at least one full table scan.

2. More likely they are using the Waterman "reservoir sampling" algorithm (Knuth, alg. 3.4.2R). This would not require the entire table to be read into spool, but it does require a full table scan and a small reservoir. (Knuth discusses the size needed.)

I agree that it seems unnecessary to read the entire table into spool, at least if the second algorithm is being used. The Explain is not very informative. I've often wondered about this myself. Developers?...

--wgr


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Attachments

Library

Quick Reference