Home Page for the TeradataForum
 

Archives of the TeradataForum

Message Posted: Sat, 25 Feb 2006 @ 09:58:16 GMT


     
  <Prev Next>   <<First <Prev Next> Last>>  


Subj:   Re: Adding Nodes to a System: Re-Collection of Stats necessary?
 
From:   Victor Sokovin

  Statistics are not dependent on the machine configuration and are thus not affected by an event such as a reconfig. Statistics are built by creating 100 ranges, and then collecting some information for the values within that range.  


Jeff, I don't know whether you posted before me last night but in my posting I have referred to the thread last summer where I had put together some evidence which leads me to believe that some estimates on data distribution made during statistics collection may very well depend on the machine configuration.

This is the academic part of the discussion. I, for one, have not yet heard any argument that would convert me to your theory, Jeff.


  Please remember that if your stats are old, the affect of those old stats may be magnified with a reconfig. Therefore, one might see an improvement by the collecting of stats within the system. If your stats are up to date, no collection of stats is necessary upon the reconfiguration of your system.  


"The affect of those old stats may be magnified with a reconfig" is the phrase I could use some help with. If you can, please zoom in on it. But I just take your word for it for the moment. If this is true then to have everything covered statistics must be re-collected on the old configuration just before the reconfig because some process can always modify, say, 15% of the data in an important table last-minute, just before the reconfig.

If a manager responsible for the reconfig does not collect stats after the last data modification, he/she can always later be accused with the argument similar to yours, Jeff: "See? This table and that table have been modified after you collected stats on it!"

I'd say to somebody who has doubts about this whole theory of reconfg: don't collect stats on the old configuration, save some time on that operation and start your reconfig earlier. After the reconfig, collect all the stats on the new configuration (this process can be faster now). No risks left in this area.

So, even from this pure practical point of view it makes sense to collect stats after the reconfig.

> To see the data that gets collected when you run stats run the > following command:


          > help stats  column ;
  or appropriate variations.  


  Please note that the output is different from the typical help stats command.  


Isn't there more behind statistics? Try one of the statements Dieter (and Terry?) distribute regularly on this forum, they should give more insight.


  As you will see, there is nothing in there that will change with a reconfiguration of a system.  


Perhaps not but there is more to this subject than the above HELP command covers.

The real test would be:

1. collect all stats on the old configuration;

2. run a good set of representative queries and capture their run times;

3. run Dieter/Terry SQL;

4. reconfig;

5. run the same benchmark queries and capture their run times; perhaps run Dieter/Terry to see whether something changed - who knows?

6. collect all stats on the new configuration;

7. run Dieter/Terry SQL;

8. run the benchmark.


Quite a program to test in a real-life situation but if somebody has time and resources ...

Another twist to this topic is the Active DWH, where everything is on-line, including the reconfig and stats collection. I wonder how fast the reconfig takes place on such systems. Any issues there?


Regards,

Victor



     
  <Prev Next>   <<First <Prev Next> Last>>  
 
 
 
 
 
 
 
 
  
  Top Home Privacy Feedback  
 
 
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 15 Jun 2023