Archives of the TeradataForum
Message Posted: Fri, 28 Apr 2006 @ 22:12:35 GMT
Subj: | | Re: Question about UPI Vs NUPI |
|
From: | | Walter, Todd |
| If you use a Set table then while loading it has to do the dup checking to make sure the entire record is unique. Time consuming. If you
go with multiset table and a UPI then it would just check for the PI to be Unique. | |
This seems to be a confusing and poorly understood topic...
IF there is a UPI, then no duplicate row checking is every required or performed by Teradata. Only the UPI value of a new row needs to be
checked against other existing UPI values and only within the bounds of the row ash to which to new row hashes. Thus MULTISET is never required
and is in fact redundant on a table with a UPI.
IF there is a NUPI and the table is a SET table, then Teradata must compare the full contents of the row against other rows with the same NUPI
values in order to enforce the unique row constraint. IFF there are a LOT of rows with the same NUPI value, then this process can get expensive. a
"lot" is a little fuzzy but the rule of thumb is that there will be more duplicates than fit in a few data blocks. Take your row size and divide
it into the average block size for the table (use 48K if you don't know) and multiply times 3 to get a rough boundary to use. For narrow rows this
will be a couple thousand duplicates, for wide rows it can be substantially less. If your average NUPI value has a frequency greater than that
rule of thumb of if you have some really skewed NUPI values that are way over that ROT even though the everage is small, then MULTISET would be
indicated.
The price of MULTISET is that the system allows duplicate records. Fine if that is what you intend, but if that is not intended behavior then
you need to monitor for data problems that can creep in because of restarted/rerun loads, broken ETL processes,... which would otherwise be caught
by the dup row check.
|