Archives of the TeradataForum
Message Posted: Thu, 29 Oct 2015 @ 21:53:22 GMT
- UPI and USI have to be separated.
- UPI will compare only the UPI columns which will be a small cost. Saying multiset on a UPI table is meaningless anyway since there cannot be duplicate rows.
- NUPI with highly unique data in the NUPI columns will also be acceptable cost. We will only compare the rest of the columns for the rows with the same NUPI and if that set is small then there will be few compares. Note that the NUPI may also be split up among many partitions and then we will look only at the NUPIs in the one partition into which the row is to be inserted which means that the number of rows per NUPI will be per partition not the total number for that NUPI value.
- USI is only updated after the base row has been inserted. We will do the same tests we would do for duplicate rows if the USI was not present. Once that has been completed and the row inserted, we will send the USI columns to the appropriate AMP for insertion and enforcement of the USI constraint.
- Multiset is very valuable when a NUPI is present which is highly non-unique or has highly skewed values. By removing the duplicate enforcement, the cost of inserting into these cases can be significantly reduced. Of course then it is the ETL/ELT process's responsibility to not generate/insert duplicates (if duplicates are not desired in the table).
|Copyright 2016 - All Rights Reserved|
|Last Modified: 24 Jul 2020|