|
|
Archives of the TeradataForum
Message Posted: Wed, 15 Feb 2007 @ 00:05:04 GMT
Subj: | | Re: How Do You Measure/Validate Compression Savings? |
|
From: | | Walter, Todd A |
| "Even if the columns get loaded compressed into the spool you can still do all kinds of operations on them (take simple concatenation, for
example) and then you can compare thus built "derived" columns with, say, some other compressed columns. In order to compare things like that some
uncompression seems to be very likely, and I don't see how uncompression does not affect CPU usage." | |
| This is to describe what users may typically do with the compressed columns after they load the data in the spool and I wonder whether the
Teradata code is consistent with "pointer reference to the value" or it may occasionally slip into copying and duplicating the values once
complexity of operations increases (the latter would mean increase in resource consumption). | |
| The code is of course not public domain and it is difficult to get access to an efficient lab able to measure all effects so I personally
reserve the right of doubting about the end result but maybe you can confirm that in the scenarios I describe here Teradata will consistently use
pointers? That would be interesting to know. | |
[taw]
Prior to V2R6.1, spools have no compression. Thus as the data is accessed from the base tables the value is copied from either the row or the
compress value list as described above to fill the value into the spool row. There is no additional cost here, have top copy the value from
somewhere.
With 6.1 we introduced spool compression. If the column is copied as-is (no expressions applied against it) then we will copy over just the
compressed bits into the spool file if it is a compressed value saving CPU and I/O. Once in the spool, it works just like in a base table. We have
a compress value list in the table header of the spool which lives in memory while we are operating on the spool.
The key to the whole discussion is that we do not have to un-compress blocks, or other large chunks of data to be able to access one row or one
value. This really removes the significant CPU cost tradeoff common to most compression implementations.
| |