Archives of the TeradataForum
Message Posted: Thu, 14 Jun 2012 @ 22:41:43 GMT
Looking at a dimensional model where there exists a billion row fact table and some million row dimension tables, what would be the best practice for creating primary indexes and or secondary indexes on the tables to provide efficient query execution?
An example might look like the following:
REGION_DIM TABLE Region_ID unique Country City State Region_Description Region_Code 100 rows PRODUCT_DIM TABLE Product_Dim_ID unique Product_Group Product_Family Product_Brand 5 million rows CUSTOMER_DIM TABLE Customer_Dim_ID unique Customer_Name Customer_State Customer_City 10 million rows ORGANIZATION_DIM TABLE Org_Dim_ID unique Org_Name Org_Region_ID Org_Division Org_Group Org_Office 5 K rows CALENDAR_DIM TABLE Cal_Date unique Cal_Week Cal_Month Cal_Quarter Cal_Year 7,300 rows
Each dimension provides a value to form the PK of the fact table.
Additive quantity fact attributes can be summarized from day up to quarter.
Would like ad-hoc capability such that not all dimension values are used.
Example: Show all products, customers, organization for a specific region over date.
Show specific customer for all products.
Show Customers supported by a specific organization.
Would the PI be all the dimension ids for distribution?? A specific combination of ids for the most common queries?
Partitioning by Date or Date part of the PI?
Several secondary indexes on various dimension ids?
Trial and error query explain plan evaluation?
Question too complex for email?
Is there a TD manual reference for this type of information?
|Copyright 2016 - All Rights Reserved|
|Last Modified: 27 Dec 2016|