Archives of the TeradataForum
Message Posted: Thu, 17 Mar 2005 @ 15:33:12 GMT
I suggest you take some classes. If you don't get this right, you will have a poorly performing database.
See below. These answers assume you at least understand the basis of Teradata Parallelism and how data is spread across the amps (workers). AMPS are now normally referred to as VPROCS (virtual processors). But I will do my best to explain as much as possible.
1. What is the difference between primary index (PI) and secondary index?
First a primary index is not the primary key. The primary key is what makes a row unique.
A primary index is an index that defines how the data gets distributed across the amps. The data is spread across the amps so that each amp in parallel can process only a portion of the data at a time. It can also be used for very fast retrieval of one or a few values in the primary index.
The choice of the primary index is the most important aspect of Teradata Performance. The factors in choosing a primary index are access paths and even distribution. Normally 'access paths' is the most important. 'Access Paths' can be two types join criteria and row lookup. 'Join criteria' is usually the most important factor in Data Warehousing, because you don't normally want to find a few rows. You would normally want to choose the columns used most frequently in joins as long as the data distributes fairly evenly (values are nearly unique). The reason is when tables with different PI's are joined one of the tables has to be redistributed to perform the join. When tables with the same PI are joined no redistribution of data occurs. Redistribution is only bad for large volume tables, you won't have problems with redistribution of small look up table. IE STATE_PROVINCES
A good example of this is a Customer table and an order table.
The key for the customer table is customer_id. The key for the order table is the order_id. Since these tables are routinely joined by customer_id, it would be best to have both tables PI's as customer_id. This assumes that you have a relatively even distribution of orders by customers.
However, don't choose the same primary index if there is a high amount of data distribution skew.
A primary index does not cost you any more space. You always have one. If you not define one it will be the first column (see answer two for other variations).
A secondary index is an index like in any other database. It is a subtable that can be used for faster access to the data in the main table. Because it is a subtable it consumes more space.
2. When i create a primary key constraint, is the index created automatically or explicitly i need to create an index.
When the table is created with the primary key constraint, A primary key constraint will get created as an index. If a primary index is not explicitly defined it will be created as a unique primary index (UPI). If a primary index is explicitly created then the primary key is created as a unique secondary index (USI).
When a primary key is added to a table is will become a USI. This is because a primary index has already been defined.
|Copyright 2016 - All Rights Reserved|
|Last Modified: 28 Jun 2020|