Home Page for the TeradataForum
 

Archives of the TeradataForum

Message Posted: Sat, 11 Jun 2005 @ 08:00:45 GMT


     
  <Prev Next>   <<First <Prev Next> Last>>  


Subj:   Re: statistics AFTER reconfig ?
 
From:   Victor Sokovin

  Only that kind of information is stored within statistics. Would you expect different result sets for different cofigurations when you submit a "select col, count(*) from table group by 1;"? This is the first step of a Collect Stats, then the result set is copped into up to 100 intervals.  


  If it really was configuartion dependend then the System Emulation Tool would be useless :-)  


First of all, thanks for the thoughts on this.

We have to be careful with the two definitions of data skew apparantly used by TD.

I have no problem with the above. Indeed, the basic stats such as the number of distinct values per column are of course properties of the data in the column, and the results should be independent of the method used. Number of AMPs has nothing to do with it. You could even calculate these stats on another database. So far so good.

However, the story with histograms and their types, sampling methods, estimates for loners etc seems to depend on the second data skew definition:


  - Partition skew refers to skew that results from an uneven distribution of data across the AMPs.  


  The difference is apparent from the context. As used in this manual, the term usually refers to the partition skew that occurs when the primary index for a table is defined on a column set that is highly nonunique."  


As this definition depends on the system configuration, every assumption or estimate based on it becomes system dependent. =


  But maybe there could be a minor difference for Sample Stats...  


Yes, that's what I thought as well. Sample stats are likely to be dependent on the system (modulo the assumptions I am making or have to make when the manual is not clear). Whether this is a minor or a major dependence, I don't know in general. I guess it depends on the system, the data, how skewed it was before and after the upgrade etc.

I would not bet that the normal stats are totally free of the system configuration either. One IF ... THEN somewhere in the code would be enough to get a dependency.

What can be verified here is the results before and after the upgrade. Not many shops, however, would plan such a check. There is usually enough to do besides the academic side but who knows perhaps one day somebody on the list will have time for this.


Regards,

Victor



     
  <Prev Next>   <<First <Prev Next> Last>>  
 
 
 
 
 
 
 
 
  
  Top Home Privacy Feedback  
 
 
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 15 Jun 2023