Home Page for the TeradataForum
 

Archives of the TeradataForum

Message Posted: Wed, 30 May 2001 @ 16:29:13 GMT


     
  <Prev Next>   <<First <Prev Next> Last>>  


Subj:   Re: Compress
 
From:   Mark Morris

With VARCHAR vs. COMPRESS, the key indicator is the difference between the maximum field length and the average field length. The space used by a VARCHAR column is given as:

        (avglength + 2) * cardinality

Wherease the space used by a compressed fixed length CHAR field is roughly given by

        (fieldlength) * (1 - compressablepercent) * cardinality  + (cardinality * pb)/8

        where pb=2 for a nullable column compressed on a non-NULL value and pb=1 otherwise.

All calculations are in bytes.

It's actually a little more complicated because there is a small amount of space in the table header to store the compressed value and also the presence bits may or may not cause another presence byte. I generally just consider the column in isolation as if presence bits did not have to be added in whole bytes.

If VARCHAR and COMPRESS both yield roughly the same capacity savings, I recommend VARCHAR. The reason being that although Teradata compression is very low CPU overhead, it is not zero. When compared against a fixed length uncompressed representation, the savings in CPU for I/O will more than compensate for the small compression CPU overhead.

So, let's take an example. Billion row table, nullable column, compress on NULL and most frequent value which occur 20% of the time. Character field has a maximum length of 12 and an average length of 8.

        UNCOMPRESSED capacity = 12,000,000,000 = 12 GB
        VARCHAR capacity = 1,000,000,000 * (8 + 2) = 10 GB
        COMPRESS capacity = 1,000,000,000 * 12 * (.8) + 1,000,000,000 * (2/8)
                                = 9,600,000,000 + 250,000,000
                                = 9,850,000,000 = 9.85 GB

In this example, both VARCHAR and COMPRESS save about 20% over an uncompressed fixed character field. I would probably still choose VARCHAR in this case even though the compression uses 1.5% less space.

Also - when looking for space saving opportunities don't forget about tightly specifying decimals when the range is well known. Here is the storage required as a function of decimal digits specified:

Decimal 1-2 digits              1 byte
Decimal 3-4 digits              2 bytes
Decimal 5-9 digits              4 bytes
Decimal 10-18 digits            8 bytes

Mark



     
  <Prev Next>   <<First <Prev Next> Last>>  
 
 
 
 
 
 
 
 
  
  Top Home Privacy Feedback  
 
 
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 15 Jun 2023