|
|
Archives of the TeradataForum
Message Posted: Sat, 13 Nov 2010 @ 09:55:33 GMT
Subj: | | Re: Teradata's Indexing: PPI vs PI |
|
From: | | Dieter Noeth |
Faible Mou wrote:
| My understanding of indexing, in general, is that there will be some kind of (b-)trees involved. Once a field is indexed, searching for a
record will descend a tree from the top to the bottom. Now, | |
You might think of a Teradata Index as a kind of b-tree which has *always* three levels on each AMP:
1. Master Index, always in memory, created during startup, sorted by TableID+RowID, used to locate
2. Cylinder Index, first logical I/O, may be in cache, sorted by TableID+RowID, used to locate
3. Datablock, second logical I/O, probably not cached, rows on a heap, sorted using a pointer array
| For tables with a PPI, Teradata utilizes a 3-level partitioning scheme to distribute and later locate the data. The 3 levels
are: | |
| 1. Rows are distributed across all AMPs (and accessed via the Primary Index) based upon DSW portion of the Row Hash. | |
This is similar to a PARTITION BY HASH in other DBMSes
| 2. At the AMP level, rows are first ordered by their partition number. | |
| 3. Within the partition, data rows are logically stored in Row ID sequence. | |
| About the 3rd point, is it talking about the actual data rows from the table, or the index it self? | |
A Primary Index is like an encyclopedia, as soon as you find the keyword, you also got the data (other DBMSes might call it Clustered
Index).
| If it is the actual data rows from the table, > how can randomly inserted data rows be arranged so organized? | |
The same as any b-tree.
As you wrote, it's only *locically* stored in RowID sequence, thus you just have to maintain the *logical* order.
| How about any row being deleted? | |
It's just removed.
As datablocks in Teradata don't have a fixed size and Teradata always cleans up after insert/update/delete (a kind of immediate mini-reorg),
this might result in moving some bytes on a datablock in memory and writing a new smaller datablock.
The size of a datablock is 1 to 127 sectors, where a sector is 512 bytes.
| "A query that requests "order information" (with a WHERE condition that specifies a range of dates) will result in a full table scan
of the NPPI table" | |
| For the above statement, the "full table scan", does it mean traversing the whole table or traversing the index tree only? | |
index tree = table :-)
Dieter
| |