Archives of the TeradataForum
Message Posted: Wed, 22 Feb 2006 @ 17:30:25 GMT
The reason is that the user and / or application will hardly ever Know the exact timestamp to supply to the querys restrictions That are needed to make use of a multi-column PI. This is shown in the case provided here.
The SQL is providing only the acct_id in the SQL restrictions but not the Timestamp. The timestamp may be what is required to enforce the PK (although this would also be hard to use by the user for the same Reason). In this case how would the user tell the database which acct_id row that it wants to look at. In OLTP they may pick this off of a screen with A list of the acct_id rows. But in Decision Support (Data Warehousing), you should provide some way to tell the database which rows are valid for the analysis being requested.
So for the DW design on Teradata the PK is not the same as the PI.
You will still need to qualify the PK somehow to prevent Duplicate rows from being used in the analysis or returned to the user.
The PK is for logical uniqueness.
The PI is for physical access.
Providing a PI value will get you to the AMP and Data blocks(s) for that row hash value.
The PI does not need to be unique (UPI). I.e. There can be many duplicate rows For the same PI value. However, the more unique the PI is the better for Most type of processing. Anything under 500 duplicates is normally Perfectly fine. Above this you'll need to analyze the impact on the system.
The other potential problem with Non Unique Primary Indexes (NUPI) is that There can be some PI values that occur more often than others. I.e. There are some Acct_id that may have millions of transactions and other that have only a few. This is probably more of an issue than the number of duplilcates but only if the range is large. However if you follow the above guidelines of < 500 duplicates, you won't likely have a problem. That said, there is often a large spread for accounts due to their Specific transaction patterns.
If you have a design that does not fall within the above guidelines and or has PI values with a large amount of skew between the minimal number of PI values and the maximum number of PI values, then you will need to get some PS help, or provide much more design information to the list, or take a Teradata Design class to get a broad understanding of how the system functions.
*** Also the user can always supply more information than the Primary index and Teradata Will use the PI as well as the extra information to further restrict the contents of the SQL request Results.
While a NUSI may help either by providing a single column secondary index instead of the multi-column PI or the reverse, picking the right PI index is of great importance when doing design For Teradata. The performance difference between a PI request and even the scan of a NUSI index sub table is extremely large even for a moderately sized table.
The PI is a single amp Operation for a few data blocks. The NUSI sub table scan is a all amp operation that will touch all of the sub tables data Blocks. The difference could be many 1000's of times difference in performance.
|Copyright 2016 - All Rights Reserved|
|Last Modified: 15 Jun 2023|