https:

Archives of the TeradataForum

Message Posted: Mon, 28 Mar 2016 @ 12:41:04 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		Re: PK used as USI and not UPI

From:		Dieter Noeth

Anomy.Anom wrote:

Thanks, Irwing; but why choose a different Primary Index other than the PK in LDM?

Because you want fast queries, when you implement all PK as UPI you get a perfectly distributed system with horrible performance :)

Criteria for Choosing the Primary Index

Access = Maximize single-AMP operations

- Choose the column(s) most frequently used for JOIN (and WHERE conditions), usually a Primary or Foreign Key

Distribution = Optimize parallel processing

- Choose the column(s) that provides good distribution

- Better when more unique values and less rows per value

- Adding more columns might result in better distribution but less usability: The RowHash can only be calculated when all values are exactly known

Volatility = Reduce maintenance resource overhead (I/O)

- Choose a column with stable data values

- An UPDATE of the PI column effectively results in:
DELETE row - send row to new AMP - INSERT row (plus maintaining all Secondary/Join Indexes)

The Primary Key of a table is always a perfect fit based on distribution and volatility, as it's unique and stable.

But access is the most important criterion and a PK might not be accessed at all.

So data distribution has to be balanced with access usage in choosing a PI.

<\cut & paste>

Many PKs only exist to guarantee uniqueness in your logical model, but are hardly used in WHERE or JOIN-conditions. I usually don't consider implementing them as USI as it's just CPU/IO/Perm overhead and your ETL process has to check for PK-violations anyway.

In fact those rules are quite similar to choosing a Clustered Index in other DBMSes.

Dieter


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Attachments

Library

Quick Reference