Archives of the TeradataForum

Message Posted: Sat, 13 Nov 2010 @ 09:55:33 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		Re: Teradata's Indexing: PPI vs PI

From:		Dieter Noeth

Faible Mou wrote:

My understanding of indexing, in general, is that there will be some kind of (b-)trees involved. Once a field is indexed, searching for a record will descend a tree from the top to the bottom. Now,

You might think of a Teradata Index as a kind of b-tree which has *always* three levels on each AMP:

1. Master Index, always in memory, created during startup, sorted by TableID+RowID, used to locate

2. Cylinder Index, first logical I/O, may be in cache, sorted by TableID+RowID, used to locate

3. Datablock, second logical I/O, probably not cached, rows on a heap, sorted using a pointer array

For tables with a PPI, Teradata utilizes a 3-level partitioning scheme to distribute and later locate the data. The 3 levels are:

1. Rows are distributed across all AMPs (and accessed via the Primary Index) based upon DSW portion of the Row Hash.

This is similar to a PARTITION BY HASH in other DBMSes

2. At the AMP level, rows are first ordered by their partition number.

3. Within the partition, data rows are logically stored in Row ID sequence.

About the 3rd point, is it talking about the actual data rows from the table, or the index it self?

A Primary Index is like an encyclopedia, as soon as you find the keyword, you also got the data (other DBMSes might call it Clustered Index).

If it is the actual data rows from the table, > how can randomly inserted data rows be arranged so organized?

The same as any b-tree.

As you wrote, it's only *locically* stored in RowID sequence, thus you just have to maintain the *logical* order.

How about any row being deleted?

It's just removed.

As datablocks in Teradata don't have a fixed size and Teradata always cleans up after insert/update/delete (a kind of immediate mini-reorg), this might result in moving some bytes on a datablock in memory and writing a new smaller datablock.

The size of a datablock is 1 to 127 sectors, where a sector is 512 bytes.

"A query that requests "order information" (with a WHERE condition that specifies a range of dates) will result in a full table scan of the NPPI table"

For the above statement, the "full table scan", does it mean traversing the whole table or traversing the index tree only?

index tree = table :-)

Dieter


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Attachments

Library

Quick Reference