Archives of the TeradataForum
Message Posted: Thu, 29 Oct 2015 @ 21:53:22 GMT
Subj: | | Re: SET vs MULTISET tables with UPI/USI |
|
From: | | Walter, Todd |
- UPI and USI have to be separated.
- UPI will compare only the UPI columns which will be a small cost. Saying multiset on a UPI table is meaningless anyway since there cannot be
duplicate rows.
- NUPI with highly unique data in the NUPI columns will also be acceptable cost. We will only compare the rest of the columns for the rows with
the same NUPI and if that set is small then there will be few compares. Note that the NUPI may also be split up among many partitions and then we
will look only at the NUPIs in the one partition into which the row is to be inserted which means that the number of rows per NUPI will be per
partition not the total number for that NUPI value.
- USI is only updated after the base row has been inserted. We will do the same tests we would do for duplicate rows if the USI was not
present. Once that has been completed and the row inserted, we will send the USI columns to the appropriate AMP for insertion and enforcement of
the USI constraint.
- Multiset is very valuable when a NUPI is present which is highly non-unique or has highly skewed values. By removing the duplicate
enforcement, the cost of inserting into these cases can be significantly reduced. Of course then it is the ETL/ELT process's responsibility to not
generate/insert duplicates (if duplicates are not desired in the table).
|