|
|
Archives of the TeradataForum
Message Posted: Fri, 19 Oct 2001 @ 15:15:35 GMT
Subj: | | Re: Insert select with sample |
|
From: | | Geoffrey Rommel |
| I've attached the piece of the explain that indicates that it'll read the whole table into spool. Does this make sense? I would
have expected it to only load into spool the sample amount, but it SAMPLES spool, not the table. | |
And later:
| the sample 500000 still reads all the rows into spool, at least that's what the explain indicates. | |
I don't know this for a fact, but I strongly suspect that Teradata is using one of the following two algorithms for sampling:
1. The Fan-Muller-Rezucha algorithm, described in Donald Knuth, Seminumerical Algorithms, algorithm 3.4.2S. This algorithm
unfortunately requires the exact number of input rows to be known, so it would require at least one full table scan.
2. More likely they are using the Waterman "reservoir sampling" algorithm (Knuth, alg. 3.4.2R). This would not require the
entire table to be read into spool, but it does require a full table scan and a small reservoir. (Knuth discusses the size needed.)
I agree that it seems unnecessary to read the entire table into spool, at least if the second algorithm is being used. The Explain
is not very informative. I've often wondered about this myself. Developers?...
--wgr
| |