Archives of the TeradataForum

Message Posted: Wed, 17 Nov 2010 @ 21:20:08 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		Re: Teradata's Indexing: PPI vs PI

From:		Dieter Noeth

Faible Mou wrote:

I'd like to know more about Primary Indexes, if it not a trade secret.

Have a look a the Database Design manual, the only proprietary stuff is the actual hashing fuction.

SECTION 3 Physical Database Design, Part 1: Indexing

The basic concept is quite simple, at least compared to the complexity of the optimizer stuff :-)

I meant, how can this guarantee be possible? Hashing is bound to have collides. Even 128b md5 can't avoid it. How big is TD Index to avoid hash collides? Is its hashing result better (more uniformly distributed) than md5? How is it's hashing speed comparing to computing md5?

It's not actual hashing, it's using a hash algorithm to create a hash value, which is used as a kind of surrogate key.

The big advantage: It's always 4 bytes regardless of the actual data size of the PI.

Of course md5 hashes "better" and has less collisions, TD's hashing is like an advanced CRC32 (it's only 4 bytes).

As collitions can't be avoided TD adds a second 4 byte value (which is actually a sequence value per hash value), the combination "row hash" plus "uniqueness value" results in a 8 byte unique id for each row in a table.

And this RowID is used for distribution across AMPs and sortung within AMPs.

Dieter


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Attachments

Library

Quick Reference