Archives of the TeradataForum

Message Posted: Tue, 14 Mar 2006 @ 11:07:43 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		Re: PPI and Master/Cylinder Index structures

From:		Anomy Anom

<-- Anonymously Posted: Monday, March 13, 2006 19:15 -->

Hi Michael,

Partition #0 (zero) is always assumed for all rows in every non PPI (NPPI) table.

When a system is upgraded to V2R5, a conversion script runs to modify the internal layout of both Master Index (MI) and Cylinder Indexes (CIs), adding the Partition field (length 16 bits) to both of them, so we have now three cylinder/block identifying fields, Table ID, Partition #, and Row Hash, then the conversion populates the data structures with zeroes for the new Partition field (during the migration no PPI tables exist yet).

Finally both MI and CIs entries are sorted by Table ID, Partition #, and Row Hash.

Data and index subtables belonging to NPPI tables are left untouched during the migration.

When you start running your queries in V2R5 and you have for instance a single-AMP retrieve using the Primary Index value, a new argument, i.e. Partition #, will be added to Table ID and Row Hash, no matter if you access an NPPI or PPI table, and it will be zero by default for any NPPI table.

So you will not suffer any processing overhead because the MI and CIs look up processing already include the Partition # in every access.

As new PPI tables are created they will have Partition # values greater than zero, and their entries in MI/CIs will be in the needed order by Table ID, Partition #, and Row Hash.

At first sight secondary indexes structures would not need the Partition # for NPPI tables, but I guess that in order to have a standard row key layout (row key replaces the concept of row id, while formerly we had Row Hash + Uniqueness Value, we now have Partition #, + Row Hash + Uniqueness, the row key has 10 bytes as it has been explained earlier), probably when a new secondary index is built for NPPI it also includes the Partition # equal to zero, using the two additional bytes. I guess this dependes directly on the code.

Each row in a PPI table has the Partition # field present, NPPI rows do not include it.

After these changes to the File System, and considering your questions, the same level of performance seems to be achieved when accessing rows by means of secondary indexes, at least in my experience performance is as good as it has been before.

But AFAIK there's not a way to find directly a specific row hash within a partition, the AMP will write all PPI rows first logically ordered by Table ID, then Partition #, and finally row hash within the partition. Consider no new internal control structures have been added to support PPI.

As we know the advantage is that in PPI all rows within the AMP, which belong to same partition, are logically adjacent, speeding up significantly your queries based on range values for your partitioning column, specially when only a few partitions qualify, and you can also have control over whole partitions for maintenance purposes.

The only disadvantages I know happen 1) when your partitioning column does not belong to the PI, so it must be a NUPI, and hence all AMPs must scan their partition rows when looking for a specific PI value (the recommendation to avoid this is to define a USI on the NUPI if values are indeed unique) and 2) when you try to join PPI and NPPI tables, and they have the same PI (row order within each AMP is different for PPI and NPPI). In both cases the disadvantage may not be so relevant if you have a small number of partitions in your PPI.

Hope this helps.


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Attachments

Library

Quick Reference