Home Page for the TeradataForum
 

Archives of the TeradataForum

Message Posted: Tue, 11 Dec 2007 @ 13:34:33 GMT


     
  <Prev Next>   <<First <Prev
Next>
Last>>
 


Subj:   Re: Collect Stats on PI: Index vs Column
 
From:   Victor Sokovin

  I know it's not supposed to be this way, but I have noticed anecdotally (pre-v2r6, haven't checked on this version) that the optimizer often made better decisions based on stats on an index on a column than it did stats on just the column itself, even if it did not then use the index for selection.  


  This makes me do basic things like always making sure FK columns have an index and stats, because it can make the difference between a small table selection or not. Rule of thumb should be, try stats on column, then try index/stats on the column, and see which one works best. If it's a wash, just keep the stats.  


Dave, I seem to recall discussions on this topic just after the first R5 versions went in production. I even went into the trouble of reading the manuals explaining multi-column stats and the stats collected on samples. My impression was that the algorithms (or rather their intentionally vague descriptions) used for all kinds of statistical estimates were indeed different in different circumstances. One of my conclusions, for example, was that the INDEX syntax should clearly make a difference for at least sampled stats.

In this type of discussions we are often told *where* the stats are stored but I never saw confirmations that exactly the same information would be stored for the "column/stats" and "index/stats" scenarios. If you really have cases when the optimizer makes different decisions it would be interesting to compare the stats and see the differences. Then we could say: "See? This and that is different." Otherwise, they would never believe. I must admit I was too lazy to stage such experiments, especially after discovering that the sampled stats were not - shall we say - really interesting to spend the time on (as I said, I thought it would be easier to demonstrate the differences for sampled stats).

Will just add that manuals are written in parallel with the actual implementation of the release and some last-minute changes / cancellations in the implementation process may not always make it into the manuals. I think we all can come up with examples of this kind. Perhaps this is what happened in this case as well and it left the subject open to different interpretations.


Regards,

Victor



     
  <Prev Next>   <<First <Prev
Next>
Last>>
 
 
 
 
 
 
 
 
 
  
  Top Home Privacy Feedback  
 
 
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 15 Jun 2023