![]() |
|
Archives of the TeradataForumMessage Posted: Thu, 14 Jun 2012 @ 22:41:43 GMT
Hi, Looking at a dimensional model where there exists a billion row fact table and some million row dimension tables, what would be the best practice for creating primary indexes and or secondary indexes on the tables to provide efficient query execution? An example might look like the following:
REGION_DIM TABLE
Region_ID unique
Country
City
State
Region_Description
Region_Code
100 rows
PRODUCT_DIM TABLE
Product_Dim_ID unique
Product_Group
Product_Family
Product_Brand
5 million rows
CUSTOMER_DIM TABLE
Customer_Dim_ID unique
Customer_Name
Customer_State
Customer_City
10 million rows
ORGANIZATION_DIM TABLE
Org_Dim_ID unique
Org_Name
Org_Region_ID
Org_Division
Org_Group
Org_Office
5 K rows
CALENDAR_DIM TABLE
Cal_Date unique
Cal_Week
Cal_Month
Cal_Quarter
Cal_Year
7,300 rows
RESULTS_FACT TABLE Each dimension provides a value to form the PK of the fact table. Additive quantity fact attributes can be summarized from day up to quarter. Would like ad-hoc capability such that not all dimension values are used. Example: Show all products, customers, organization for a specific region over date. Show specific customer for all products. Show Customers supported by a specific organization. Would the PI be all the dimension ids for distribution?? A specific combination of ids for the most common queries? Partitioning by Date or Date part of the PI? Several secondary indexes on various dimension ids? Trial and error query explain plan evaluation? Question too complex for email? Is there a TD manual reference for this type of information? Thanks, Nathan
| ||||||||||||||||||||||||||||||||||||||||||||||||
| | ||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||
|
| ||||||||||||||||||||||||||||||||||||||||||||||||
| Copyright 2016 - All Rights Reserved | ||||||||||||||||||||||||||||||||||||||||||||||||
| Last Modified: 15 Jun 2023 | ||||||||||||||||||||||||||||||||||||||||||||||||