Archives of the TeradataForum
Message Posted: Wed, 26 Jan 2011 @ 16:22:57 GMT
Subj: | | Compressing infrequent values |
|
From: | | Attila_Finta |
Hi all. I have a simple question around value compression: is it better to compress all known values or only the most frequent values?
Simple example: Currency Code in a transactional table has about 50+ distinct values, but only 9 occur > 1%. All possible values for the
column are known, and fewer than 255 exist, but most values occur less than 1% of the time.
Col_Val Val_Pct
------- -------
USD 59.1%
EUR 7.6%
CNY 6.3%
JPY 5.0%
BRL 3.7%
GBP 3.3%
CAD 2.9%
AUD 2.6%
INR 2.5%
I can add the other 40 values to the value list. But would that create greater operational efficiency or less? I suppose the simple
answer is: a little less efficient load and a little more efficient retrieval. In general we want to optimize for retrieval. Therefore do you
advise that we add all possible values? Or only the most frequent values?
Thanks.
Attila Finta
|