Home Page for the TeradataForum
 

Archives of the TeradataForum

Message Posted: Wed, 26 Apr 2006 @ 18:06:09 GMT


     
  <Prev Next>   <<First <Prev Next> Last>>  


Subj:   Re: Question about UPI Vs NUPI
 
From:   Kohut, Eric

The decision that you need to make is a little different.

Co-location of tables is almost always a good approach to minimize data movement; if when using that co-located PI, the following data demographics goals are met:

1. Distribution is even across the amps 0 - 5 % is ideal, 5 - 15 is normally manageable, 15 + % you should test.

2. The average # of NUPI duplicate values, for referenced data, is not extremely large. Large is relative. The most simple rule of thumb is how many rows will fit into 1 or 2 data blocks (Usually 1000 - 2000 duplicates). This can be extended, sometimes very far, but you have to understand the trade offs of how this changes the ways the system functions. For the right purpose, I've seen 100,000 of duplicate values work but there is an impact.

3. The highest # of NUPI duplicate values, for referenced data, is not very different than the next N highest values where N is the number of amps. Ideally less than 10 - 20 % difference. The amp that gets this highest # of duplicate values may always be doing more work and therefore always behind the others. I mentioned "for referenced data" since if you are excluding this data for most queries. I.e. Negative numbers for columnX where a view might have columnX > 0 and thus applied to all queries. This will only be a problem for data loading. That said, a high number of exact same duplicate values can be very problematic. I.E. don't use a default value for PI column or a column that is part of a PI if at all possible. There are many ways to work around this depending on your data, if needed.


With this said, the table type and Primary Index type decision comes as a result of these other decisions.

You should only use Multi-set tables if you need the following:

1. Loads are going slowly due to duplicate row checking and data work arounds (Alternatives to default values) are not an option. This also assume that you have a very reliable and trusted ETL process. I.E. You should to do a check to see if the incoming rows already exists in the table.

2. For some reason you want to load duplicate rows. I would never recommend this!!!


UPI and NUPI simpy means that the Primary Index columns are unique or they not unique. This is proven by the data and / or the logical constrants from logical model, source constraints, ETL, etc.....



     
  <Prev Next>   <<First <Prev Next> Last>>  
 
 
 
 
 
 
 
 
  
  Top Home Privacy Feedback  
 
 
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 15 Jun 2023