Archives of the TeradataForum
Message Posted: Mon, 19 Nov 2012 @ 13:39:13 GMT
Sorry, there is no straight answer to your question. As usually, there's some "it depends" involved.
Typically, there should be no performance degradation. UNLESS you hit the histogram limits as mentioned earlier.
I repeat myself: Prior TD14, there was a 16-byte limit on the value size of the statistics histogram. So if you collect on large columns or multicolumns, only first 16 bytes are considered in the histogram. This can fool the optimizer. Especially when those first 16 bytes are not distinct enough, it might choose a non-optimal execution plan. This would obviously result in degraded performance of your queries.
Imagine you have two CHAR(12) columns (colA, colB) on which you collect multicolumn statistics.
Now, for some reason, colA is always equal to '999999999999'.
ColB is a unique identifier, typically in the following form: '00001234abcd'
When you collect stats on (colA, colB), the information the optimizer will get is, that this set's value is always '9999999999990000', because of the 16-bytes limit. That's highly non-unique and the optimizer will behave accordingly, even though the (colA,colB) set itself is unique.
Just FYI, switching the order of columns in your COLLECT STATS statement doesn't change anything. I believe the order is evaluated based on the order of columns in the CREATE TABLE statement.
Bottom line, this is academically speaking. In the real world, I strongly discourage you to collect statistics on every column and every combination of multicolumn sets. It is a very bad idea, even if you don't care about the resources consumed for recollection. And if I were you, I would really care about these as well.
Also note that the histogram limit is not infinite in TD14, it just has been lifted and is adjustable.
Hope this helps.
|Copyright 2016 - All Rights Reserved|
|Last Modified: 28 Jun 2020|