data:image/s3,"s3://crabby-images/835f0/835f0183d68360e38201c0eea348393d05ddc0cf" alt="" |
data:image/s3,"s3://crabby-images/fedd4/fedd46b5eddfc5d215c8fcb543c21c47cbcce0b1" alt="" |
Archives of the TeradataForum
Message Posted: Thu, 23 Feb 2006 @ 10:41:38 GMT
Subj: | | Re: A query on the PI properties of Teradata |
|
From: | | Frank.C.Martinez |
Ahh, I love rules of thumb. Especially when I accidentally hit them with a hammer. You know, the "500" rule that you're quoting used to be
"100". Then it switched to "200" rows with the same hash. At some point in time it changed back in the Teradata documentation to "100". But at
no time does that really ever matter. It's still a rule of thumb.
The actual constraint is, how many rows you can get in a block, because that's what you want to achieve, as many of the rows with the same hash
in the same block. I think because the first record in a hash chain (notice that it the chain of records based on hash, not on the values of the
PI columns, because you can also have hash collisions) goes in (let's call it for theoretical sake) position N. The next record on the chain goes
into position N+1. The next goes into Position N-1. The next goes into N+2 and so forth. So if you put enough records into a chain that exceed
the number that fit in a block (and that would vary by the compression AND how many compressible values are in the row, the size of the VARCHAR's,
etc.), you'll extend into block + 1, then block - 1, then block + 2, etc. Which could be bad. Very bad. And I've seen it, and so have many of
nuestros amigos on this list.
And it can be very nasty. Say that you're decided to shoot yourself in the foot, and coded missing values in a column as -1 instead of NULL.
Or you have Customer Number and Date as your PI on a transaction table, which works great with people, but bad with government agencies. And
you've done ok from the day you put the table into place. Suddenly, you go after those long chains (maybe you did it on purpose or maybe the
Explain plan changed because the statistics changed), and they all migrate to spool on the same AMP. And that AMP now is doing much more than
it's fair share of the load. So you're stuck watching your box Hot Amp all over the place. Yucky, bad, and difficult to find. Then some idiot
decides that he needs to kill that particular UPDATE. So now you've got a rollback with it's own Hot Amp. Time to get a new job. Maybe as a dba
on a non-MPP database.
iv
| |