Archives of the TeradataForum
Message Posted: Mon, 22 Oct 2007 @ 10:27:07 GMT
I'm writing a Stored Procedure (well, actually, I've written it) which works out the break point for the number of columns to compress on for a table, column by column.
At the heart of it, however, I wonder whether I've over simplified it or got it just about right.
The way I'm tackling it is to cumulatively add up the space taken up by the column, based upon its size (for example, Integer being four bytes, etc) and multiplying that figure by the number of rows that 'attract' the number of compression values under consideration, which would be 1,3,7,15,31,127 and lastly 255.
Obviously, these values are based upon most prevalent values in the table.
From this, I then calculate the overhead taken up by the cumulative figure by using the following simple algorithm :
SET BIT_CT = CASE vFor2.COL_ROW_POS WHEN 1 THEN 1 WHEN 3 THEN 2 WHEN 7 THEN 3 WHEN 15 THEN 4 WHEN 31 THEN 5 WHEN 63 THEN 6 WHEN 127 THEN 7 WHEN 255 THEN 8 ELSE 0 END ; SET BIT_OVHEAD = BIT_CT/8.000; SET SPACE_OVHEAD = BIT_OVHEAD * TAB_CT;
So, as you can see, I'm calculating overhead as a fraction of one byte, multiplied by the total number of rows in the table.
The final calculation is a simple comparison between the space saving derived from a _particular cumulative set_ against the overhead taken up _across all rows_.
Actually, I've got all sorts of bits of flexibility written into the process, for example being able to specify a minimum column percentage compression saving or a maximum number of compression values to look at but, fundamentally, I wonder whether it can be looked at in this simplistic way.
The proof of it, of course, is in the use of it but I thought I'd ask for comments anyway.
Besides, it might trigger an interest in others to consider a similar thing. (Actually, I do know this has already been tackled within NCR, but there you are).
|Copyright 2016 - All Rights Reserved|
|Last Modified: 28 Jun 2020|