Archives of the TeradataForum
Message Posted: Fri, 08 Jul 2005 @ 20:45:54 GMT
I have a pretty basic question (well, basic anyway. I'm not very pretty).
What's the best way to create a table? Suppose I have a table with several "dimension" columns and several "fact" columns, such that any combination of the "dimension" columns uniquely selects one row. I "want" to define the primary index as a composite (all the "dimensions"), but I'm not sure that's best. I read somewhere that a "surrogate" key would be better, supported by multiple secondary indexes (NUSI).
In this particular case, I want to optimize query performance; I don't care about loading. My project involves loading the result table once a month. But, would that make a difference in which technique to use?
For example, I tried something like this:
create multiset table MKT_SUMMARY , no fallback , no before journal , no after journal , checksum = default ( market SMALLINT , channel_id SMALLINT , payment_id SMALLINT , status_id SMALLINT , month_end DATE , fact1 INTEGER , fact2 DECIMAL(15,2) , fact3 DECIMAL(15,2) , fact4 DECIMAL(15,2) , fact5 DECIMAL(15,2) ) unique primary index ( market_id, channel_id, payment_id, status_id, month_end )
When I built and loaded this table, the distribution was not very good. I read about hash bucketing, so I think I understand why (a very large proportion of the rows is in one market; there are 40 different markets. Today, there are only two values for month_end, but over time many months will accumulate).
I suppose I could generate a random number as a surrogate key during each load, but does it need to unique within the table or do you create a non-unique primary index and make the surrogate unique within each load. The example has 5 "dimensions"; my real application has 15.
I'd be happy to read any technical documentation or publications on the subject, so if you have a reference, let me know.
|Copyright 2016 - All Rights Reserved|
|Last Modified: 28 Jun 2020|