Archives of the TeradataForum
Message Posted: Sat, 11 Jun 2005 @ 08:00:45 GMT
Subj: | | Re: statistics AFTER reconfig ? |
|
From: | | Victor Sokovin |
| Only that kind of information is stored within statistics. Would you expect different result sets for different cofigurations when you
submit a "select col, count(*) from table group by 1;"? This is the first step of a Collect Stats, then the result set is copped into up to 100
intervals. | |
| If it really was configuartion dependend then the System Emulation Tool would be useless :-) | |
First of all, thanks for the thoughts on this.
We have to be careful with the two definitions of data skew apparantly used by TD.
I have no problem with the above. Indeed, the basic stats such as the number of distinct values per column are of course properties of the data
in the column, and the results should be independent of the method used. Number of AMPs has nothing to do with it. You could even calculate these
stats on another database. So far so good.
However, the story with histograms and their types, sampling methods, estimates for loners etc seems to depend on the second data skew
definition:
| - Partition skew refers to skew that results from an uneven distribution of data across the AMPs. | |
| The difference is apparent from the context. As used in this manual, the term usually refers to the partition skew that occurs when the
primary index for a table is defined on a column set that is highly nonunique." | |
As this definition depends on the system configuration, every assumption or estimate based on it becomes system dependent. =
| But maybe there could be a minor difference for Sample Stats... | |
Yes, that's what I thought as well. Sample stats are likely to be dependent on the system (modulo the assumptions I am making or have to make
when the manual is not clear). Whether this is a minor or a major dependence, I don't know in general. I guess it depends on the system, the
data, how skewed it was before and after the upgrade etc.
I would not bet that the normal stats are totally free of the system configuration either. One IF ... THEN somewhere in the code would be
enough to get a dependency.
What can be verified here is the results before and after the upgrade. Not many shops, however, would plan such a check. There is usually
enough to do besides the academic side but who knows perhaps one day somebody on the list will have time for this.
Regards,
Victor
|