Re: Optimal PI [From Michael Larkins: Wed, 23 Feb 2005 @ 18:31 GMT]

Message Posted: Wed, 23 Feb 2005 @ 18:31:11 GMT

Subj:

Re: Optimal PI

From:

Michael Larkins

Hi John:

You seem to be asking the universal question regarding the selection of a PI. It has, I think, been generally accepted that the degree of data skewedness is only one factor to consider when distributing the data for a table.

Basically, you really need to look at how you access the data as well as how it is distributed. Sounds confusing but consider this: if you spend +90% of you time doing WHERE PI_column = 'value' than that is a great candidate for a PI. These accesses will be incredibly fast, especially when the value is unique. When it is non-unique, you also want to look at the distribution to determine if it is more than a candidate, then you consider distribution and uniqueness )or lack of it.

Otherwise, if you spend the vast majority doing joins, then it is a good idea to consider distribution on the join column(s) as the PI for distribution. Otherwise, every time you do a join in a query, the data you seek will need to be re-distributed in order to do the join. The re- distribution is in SPOOL, the distribution is of course in PERM space. If Teradata is not forced to re-distribute your data so often, joins run faster because the re-distribution is eliminated and you are not as likely to exhaust your SPOOL reserves just attempting to join millions of rows together because the re-distribution is eliminated.

This is a very short encapsulation of what is normally taught in a 3-day Physical Design type of course. You start with your logical model and extend it with data demographics, like uniqueness or the lack there of and access usage, value vs join. Than from the extended logical model you look at primary and secondary index candidates. Once the final decision is made on the PI, the remaining selection(s) are secondary. For instance, if you have a Primary Key in your logical model and it does not become the Primary Index, the uniqueness aspect of the PK is maintained as a USI.

This is by no means intended to be the end all answer to your question. However, I hope this gives you some additional "food for thought."

Regards,

Michael Larkins
Certified Teradata Master
Certified Teradata SQL Instructor

Attachments

Library

Quick Reference

Archives of the TeradataForum

Message Posted: Wed, 23 Feb 2005 @ 18:31:11 GMT