## Message Posted: Mon, 22 Oct 2007 @ 10:27:07 GMT

 < Last>>

 Subj: Compression Algorithm From: David Clough

Hi All,

I'm writing a Stored Procedure (well, actually, I've written it) which works out the break point for the number of columns to compress on for a table, column by column.

At the heart of it, however, I wonder whether I've over simplified it or got it just about right.

The way I'm tackling it is to cumulatively add up the space taken up by the column, based upon its size (for example, Integer being four bytes, etc) and multiplying that figure by the number of rows that 'attract' the number of compression values under consideration, which would be 1,3,7,15,31,127 and lastly 255.

Obviously, these values are based upon most prevalent values in the table.

From this, I then calculate the overhead taken up by the cumulative figure by using the following simple algorithm :

```     SET BIT_CT = CASE vFor2.COL_ROW_POS
WHEN 1 THEN 1
WHEN 3 THEN 2
WHEN 7 THEN 3
WHEN 15 THEN 4
WHEN 31 THEN 5
WHEN 63 THEN 6
WHEN 127 THEN 7
WHEN 255 THEN 8 ELSE 0 END ;

```

So, as you can see, I'm calculating overhead as a fraction of one byte, multiplied by the total number of rows in the table.

The final calculation is a simple comparison between the space saving derived from a _particular cumulative set_ against the overhead taken up _across all rows_.

Actually, I've got all sorts of bits of flexibility written into the process, for example being able to specify a minimum column percentage compression saving or a maximum number of compression values to look at but, fundamentally, I wonder whether it can be looked at in this simplistic way.

The proof of it, of course, is in the use of it but I thought I'd ask for comments anyway.

Besides, it might trigger an interest in others to consider a similar thing. (Actually, I do know this has already been tackled within NCR, but there you are).

Thanks

Dave Clough
Database Designer
Express ICS

www.tnt.com

 < Last>>

Attachments

Library

Quick Reference

Archives

Sample Index

 2016 2007 2015 2006 2014 2005 2013 2004 2012 2003 2011 2002 2010 2001 2009 2000 2008 1999

2007 Indexes

 Jan Jul Feb Aug Mar Sep Apr Oct May Nov Jun Dec

 Top Home Privacy Feedback