Home Page for the TeradataForum
 

Archives of the TeradataForum

Message Posted: Fri, 19 Oct 2001 @ 15:15:35 GMT


     
  <Prev Next>   <<First <Prev Next> Last>>  


Subj:   Re: Insert select with sample
 
From:   Geoffrey Rommel

  I've attached the piece of the explain that indicates that it'll read the whole table into spool. Does this make sense? I would have expected it to only load into spool the sample amount, but it SAMPLES spool, not the table.  


And later:

  the sample 500000 still reads all the rows into spool, at least that's what the explain indicates.  



I don't know this for a fact, but I strongly suspect that Teradata is using one of the following two algorithms for sampling:

1. The Fan-Muller-Rezucha algorithm, described in Donald Knuth, Seminumerical Algorithms, algorithm 3.4.2S. This algorithm unfortunately requires the exact number of input rows to be known, so it would require at least one full table scan.

2. More likely they are using the Waterman "reservoir sampling" algorithm (Knuth, alg. 3.4.2R). This would not require the entire table to be read into spool, but it does require a full table scan and a small reservoir. (Knuth discusses the size needed.)


I agree that it seems unnecessary to read the entire table into spool, at least if the second algorithm is being used. The Explain is not very informative. I've often wondered about this myself. Developers?...


--wgr



     
  <Prev Next>   <<First <Prev Next> Last>>  
 
 
 
 
 
 
 
 
  
  Top Home Privacy Feedback  
 
 
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 15 Jun 2023