Archives of the TeradataForum
Message Posted: Fri, 19 Oct 2001 @ 15:15:35 GMT
I don't know this for a fact, but I strongly suspect that Teradata is using one of the following two algorithms for sampling:
1. The Fan-Muller-Rezucha algorithm, described in Donald Knuth, Seminumerical Algorithms, algorithm 3.4.2S. This algorithm unfortunately requires the exact number of input rows to be known, so it would require at least one full table scan.
2. More likely they are using the Waterman "reservoir sampling" algorithm (Knuth, alg. 3.4.2R). This would not require the entire table to be read into spool, but it does require a full table scan and a small reservoir. (Knuth discusses the size needed.)
I agree that it seems unnecessary to read the entire table into spool, at least if the second algorithm is being used. The Explain is not very informative. I've often wondered about this myself. Developers?...
|Copyright 2016 - All Rights Reserved|
|Last Modified: 27 Dec 2016|