Archives of the TeradataForum
Message Posted: Wed, 23 Feb 2005 @ 23:04:08 GMT
Chapter 10 in the database design manual covers PI selection criteria, basically Access Paths, Distribution and Volatility.
The design guidelines are to keep number of rows per PI value < 100 (reference V2R5.1 DB Design Manual, p 10-62, "Duplicate Row Checks for NUPI"),and to keep row variance < 5% (reference V2R5.1 DB Design Manual, p 10-32, "Analyzing Column Distribution Demographics to Optimize PI Selection").
Where row_skew is (Max Rows - Avg Rows) / Avg Rows
And row_variance and (Max Rows - Min Rows) / Min Rows.
Personally I think the 5% thumbrule is overly restrictive. It really doesn't do any good to have a flat data distribution if you have to redistribute the data every time you use it.
For fact tables with multi column PI's make sure each column adds significantly to the uniqueness. You need to look at proposed PI's carefully if you have developers with a background other than Teradata, they have a STRONG tendency to set PI = primary key, which is generally not a good thing. My general thumbrule for developers is each PI column should have at 10x number of distinct values as we have amps on the system. If a PI member has few distinct values, consider removing from PI. If all PI columns are weakly selective then the table is a good candidate for surrogate key. This assumes reasonably flat distribution, if skewed then need to look at it more closely (run joint distributions).
|Copyright 2016 - All Rights Reserved|
|Last Modified: 28 Jun 2020|