Home Page for the TeradataForum
 

Archives of the TeradataForum

Message Posted: Tue, 31 Jan 2006 @ 12:48:33 GMT


     
  <Prev Next>   <<First <Prev Next> Last>>  


Subj:   Re: Use of Multiset for Large Table
 
From:   Dunweber, Ole

Hello again,

Entertaining subject, one final reply from me:

Anomy replied to me:

  <SNIP>How many business questions can you answer (using the PI), if you have put say 10 columns into it, to make it unique?<SNIP>  


  In 17 years of designing and working with Relational databases I never encountered a "natural identifier" that was 10 columns long. 5, maybe 6 at most and in some those instances a surrogate key was used for inheritance purposes (FK). The particular table in question with 20B rows has 3 columns as the UPI, all 3 columns are FKs to dimensions. In a properly design Dimensional Model, it is very easy to get the dimensions that make up the PI of the large table into the SQL picture and help the OPTIMIZER put 2+2 together. The key is to ensure people know the data model, know how to properly use the tables, know the nuances that Teradata's PIs play in access to the information, and know the nuances that adding context in the form of tables supplying FKs to the PI definition can help the optimizer make the connection to use the PI (whether UPI or NUPI, makes no difference).  


How about business questions only specifying 2 of the dimensions in the UPI? Are they not sufficiently educated?


  If the table has 20B rows (1Tb) I have to do something to speed up inserts while preserving the integrity of the data. A PPI and a USI make it easy to maintain without sacrificing access and without the use of a multiset definition.  



This seems to contradict what you write later on:

  USI's are the Kiss of Death for performance.  



Regarding PPI speeding up inserts:

Yes, but only if the partitioning matches the arrival pattern of the data.

I doubt you can accomplish this on all your big tables - at least not without putting say the date-column redundantly into some of the tables, where it does not belong 3NF-wise.

Also this advantage lessons with the number of NUSI's and Join indexes on the table.

Using PPI is a fundamental change to the physical design, which must be carefully analyzed with the entire workload.


  <SNIP>Do you really consider the Data warehouse to be yours? In my humble opinion, it belongs to the business.<SNIP>  


  I must explain that I am not a DBA (although I play one here). While the hardware and Software belong to the business, I am charged by the business to be the STEWARD of the DATA WAREHOUSE. As the Steward, I am responsible for its well being and the "owners" expect that. MULTISETS are not in the best interest of the data warehouse or the business for two reasons: They violate RELATIONAL theory and they have the potential to introduce serious data integrity problems. The first one is reason enough to keep them out of the data model, the second one violates the premise of the DW being a single version of the truth (if duplicate rows yield improper results). Teradata already makes it difficult to implement reasonable Referential Integrity because it requires USI's and USI's are the Kiss of Death for performance. No need to aggravate the situation by adding Multisets to the mix. Let's remember that a PI is not a PK and DUPLICATE Primary Identifiers are indeed possible under Teradata's NUPIs.  


  Anyone with some relational database background that takes the 2 week class in Atlanta can be a Teradata DBA, but not everyone is able to bring more value to the organization by doing the role of a Data Architect and Data Administrator. In that role I make sure the data warehouse has a well designed, full blown data model that faithfully represents the business and is implemented in the Teradata box in a consistent state (RI maintained). I spend a good deal of my time evangelizing the proper use of its contents by promoting the concepts stored in the data warehouse and ensuring they are used properly. So yes, I sort of "own" the data warehouse because I act as the gatekeeper charged to ensure that whatever needs to get into the data warehouse gets properly modeled and defined. Nothing goes in or out unless it is modeled first (I do the modeling, so you get the idea).  


Yes, you probably bring more value to the business regarding integrity.

But you bring less value to the business regarding performance, time-to-market of new applications, and the flexibility of business questions which can be answered (with reasonable performance).

Is the business aware of these pros and cons of your strict stewardship?


And I totally agree with Victor Sokovin:

  "The argument that multiset tables introduce an additional risk is irrelevant provided you delegate the responsibility to the ETL and QA"  



Regards,

Ole Dunweber
Coop Nordic



     
  <Prev Next>   <<First <Prev Next> Last>>  
 
 
 
 
 
 
 
 
  
  Top Home Privacy Feedback  
 
 
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 15 Jun 2023