|
Archives of the TeradataForumMessage Posted: Tue, 02 Oct 2007 @ 20:13:03 GMT
Assuming the table is large enuff to merit MVC (those large tables that together account for 95% of the space on a DB are worthy of consideration), run a select and group by query on each of the columns under consideration, collecting the TOP N (below 256) values that each contribute at least 1/2 percent to the total number of rows in the table. (I like the latter approach; U may miss a few flat histogram columns, but this an 80-20 type of task.) Also, avoid volatile columns or columns with several decimal places of precision (such as a measurement of exchange rates to 6 significant figures). Apply the algorithm in the manual (take into account the cost of additional bytes and bits). Be aware of the Table header constraint and the limitation on the number of characters (~8K) that may be compressed for one column. If the analysis of potential savings is a close call, do not compress. If MVC does not save at least 15% of a table's volume, it is not worth taking the time to implement -- IMHO. Example (u have the table-size in rows available upfront or u table-ize it) -- Table has 100M rows Select col_16, count(*) ... Group by col_16 Having count(*) > table-size-in-rows / 200 (in this case 50K rows) (the query returns the values in a column that account for at least .5% of the rows.) U can further refine this query by using PERCENT_RANK() over ( partition by etc). Your thought about standard values such as zero, 1.0 and -1.0 often proves to be the case for compressing numeric values. Bottom line, go for the low hanging fruit; do it once a year or so. Good Luck, Bill Gregg
| ||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||
Copyright 2016 - All Rights Reserved | ||||||||||||||||||||||||||||||||||||||||||||||||
Last Modified: 15 Jun 2023 | ||||||||||||||||||||||||||||||||||||||||||||||||