Re: Difference between collect stats on a column vs. a table [From John

Message Posted: Wed, 03 Aug 2011 @ 14:22:18 GMT

Subj:

Re: Difference between collect stats on a column vs. a table

From:

John_Wight

The physical differences between the two approaches are pretty well know and have been discussed in this chain. However, the real question needs to be asked is - WHEN? i.e. When is it time to collect stats on a table/column? Just to collect stats after each ETL load/update will consume a lot of 'unnecessary' CPU cycles but we you don't collect often enough, then we raise the risk of degraded performance with stale stats.

Over the years, it has become kind of general 'rule/guideline' that when the size of the table changes +/- 10% in size - it might be a good time to recollect stats. Yeah, we know there will be differences of opinion on this but that has been the general guideline that we've always worked with. Does this mean that every column needs to have them refreshed? Probably not. However, show me anyone that knows exactly when a particular column needs refreshing and can build that logic into their stats processes and you probably have a VERY exceptional individual with too much time on their hands;-)

Also, what about tables that are constantly being updated 24/7? I've known customers with active EDW's that do this, so the question then is - when do you (again) make that decision and, if your are a 'column' person, it becomes even more difficult. So, some years ago we decided to create an 'environment' where a process can be run periodically to check and see if a particular table's size has changed 10% (+/-) and if so, then collect stats at the table level. Not being a genius to figure out what column(s) to collect, we just collect at the table level. The 10% is only a 'default' and any table could have a different threshold % and there are cases where you may need that.

We've also found that Stats are one thing that are usually not dealt with properly in that there are often too many not needed stats and this causes more problems that just column -vs- table level stats!!! Only good development/testing processes will determine the proper stats required and that in itself is a separate issue we have to deal with. We all know that Stats are critical for good performance and it is one of those 'primitives' that Teradata provides without any good tools or management facilities - so you have to create our own.

In our current environment, we execute this 'process' after each table load/update and we only do the stats collection if the threshold exceeds 10% - or - stats have not been collected in X number of days. We would content that this is a more optimal approach than most we've seen, BUT, would be interested if anyone else has created a more optimal approach. I'm sure they exist!

Again, for what it's worth on today's market . . .

Attachments

Library

Quick Reference

Archives of the TeradataForum

Message Posted: Wed, 03 Aug 2011 @ 14:22:18 GMT