Home Page for the TeradataForum

Archives of the TeradataForum

Message Posted: Tue, 29 Apr 2003 @ 13:48:26 GMT

  <Prev Next>   <<First <Prev Next> Last>>  

Subj:   Re: Collect statistics..
From:   John Hall

As long as you don't drop the table and re-create it, the statistics (from the last time they were collected) will remain on the table. So if you don't have a *really* good reason to drop and create the table, then do a DELETE ALL from your table and then re-populate. Doing the DELETE ALL is just as quick as drop/create and you'll preserve the last statistics.

If you feel that dropping the table and re-creating the table is your best choice, don't forget to add the impact of re-collecting statistics to your decision. Dropping and re-creating can be pretty expensive when it comes to the total cost.

Sankar is right - on large tables you need to perform your collect statistics on purpose rather than using a simple strategy like performing the collects every time you load the table. When collecting statistics on large tables, it's the behaviour (or characteristics) of your data that drives the strategy for collect statistics. Collecting statistics at the table level is probably fine for small tables, but it's a bad approach for large tables.

When faced with large tables, I generally recommend that statistics only need to be collected when roughly 10% of the values in a column have changed. My background has largely been in retailing and so assuming a fully populated daily sales table (there are all kinds of exceptions to the following points, but they should give you the idea):

- The column DATE_SOLD only needs to have statistics collected once every 36 days if you keep history for a full year (10% of 365 days). If you have history of two years, then you can probably get by with doing a collect statistics on DATE_SOLD once every 73 days.

- The column UPC probably only needs to have statistics collected once a year (UPCs are relatively stable and so the frequency of collect statistics is driven by changes in the type of your inventory - how often do you change 10% of business from one thing to another?).

- The column STORE_NO only needs to be collected when 10% of your store locations have changed. If you've got a fast growing business, then you might need to do collects on this column once or twice a year. A slow growing business may only need the collects once every couple of years.

- If all the stores are pretty similar, then the distribution of values (high, low, median) for columns like QTY_SOLD are probably pretty consistent year after year and so you could probably make the collects very infrequent - maybe once or twice every couple of years.

- If your PI on the daily sales table consists of DATE_SOLD, UPC and STORE_NO, then it's probably the change in DATE_SOLD that governs how often you need to collect statistics on the PI.

Just a final point: 10% is a rule of thumb. Some situations might be more sensitive to data demographics. In other cases, you might be able to get away with 20% or more. You'll have to determine your situation empirically and 10% isn't a bad place to start.

OK, another final point: If you're in a situation where you absolutely can't collect statistics as needed, then you need to change your expectations of what the machine can do for you. If you have defined an environment where the parser/optimizer can't get the information it needs (via statistics) to build optimal work plans, then you've chosen to live with the fact that your work plans will always be problematic.

  <Prev Next>   <<First <Prev Next> Last>>  
  Top Home Privacy Feedback  
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 27 Dec 2016