Re: A query on the PI properties of Teradata [From Frank.C.Martinez: Thu, 23 Feb 2006 @ 10:41 GMT]

Message Posted: Thu, 23 Feb 2006 @ 10:41:38 GMT

Subj:

Re: A query on the PI properties of Teradata

From:

Frank.C.Martinez

Ahh, I love rules of thumb. Especially when I accidentally hit them with a hammer. You know, the "500" rule that you're quoting used to be "100". Then it switched to "200" rows with the same hash. At some point in time it changed back in the Teradata documentation to "100". But at no time does that really ever matter. It's still a rule of thumb.

The actual constraint is, how many rows you can get in a block, because that's what you want to achieve, as many of the rows with the same hash in the same block. I think because the first record in a hash chain (notice that it the chain of records based on hash, not on the values of the PI columns, because you can also have hash collisions) goes in (let's call it for theoretical sake) position N. The next record on the chain goes into position N+1. The next goes into Position N-1. The next goes into N+2 and so forth. So if you put enough records into a chain that exceed the number that fit in a block (and that would vary by the compression AND how many compressible values are in the row, the size of the VARCHAR's, etc.), you'll extend into block + 1, then block - 1, then block + 2, etc. Which could be bad. Very bad. And I've seen it, and so have many of nuestros amigos on this list.

And it can be very nasty. Say that you're decided to shoot yourself in the foot, and coded missing values in a column as -1 instead of NULL. Or you have Customer Number and Date as your PI on a transaction table, which works great with people, but bad with government agencies. And you've done ok from the day you put the table into place. Suddenly, you go after those long chains (maybe you did it on purpose or maybe the Explain plan changed because the statistics changed), and they all migrate to spool on the same AMP. And that AMP now is doing much more than it's fair share of the load. So you're stuck watching your box Hot Amp all over the place. Yucky, bad, and difficult to find. Then some idiot decides that he needs to kill that particular UPDATE. So now you've got a rollback with it's own Hot Amp. Time to get a new job. Maybe as a dba on a non-MPP database.

Attachments

Library

Quick Reference

Archives of the TeradataForum

Message Posted: Thu, 23 Feb 2006 @ 10:41:38 GMT