Home Page for the TeradataForum
 

Archives of the TeradataForum

Message Posted: Mon, 11 Nov 2002 @ 18:04:37 GMT


     
  <Prev Next>   <<First <Prev Next> Last>>  


Subj:   Re: 3NF vs Dimensional modelling
 
From:   McBride, Michael

To Neil:

Since you raised so many objections to my post, I find it hard to know exactly where to start, but I will try to address most of your points systematically, if I may, and I will try to leave my, non-essential, somewhat wry humor out of it...

First I want to be on record as also having 20+ years experience in the Information Systems industry, initially as a 'report' programmer, and, after earning my BS in CS for system design in 1988, moved into data architecture and database design, and, even though Enterprise (conceptual/logical) Data Modeling is 'platform' independent, I have been implementing NF databases on Teradata since 1989.

To address the 'bigot' aspect of my opening remark, I must start by saying that the debate over NF database implementations vs. DM implementations is just an extension (and derivative) of the old 'Data' driven versus 'Application' driven system development lifecycle methodologies wars. Since I have always been in the 'Data' driven camp, I have been "labeled" a data bigot, and intolerant of other perspectives or beliefs, if you will. Since others have labeled me that way, then I will just have to carry it as a badge of honor, but I believe if you read my post carefully, you will see that I am not in-tolerant of the DM concept, just at odds with it's purpose and usefulness in the overall 'solution', and, therefore, resist any notion to use that schema at the EDW core. If that makes me bigoted about database design, than so be it. On that note, I firmly believe that business 'applications' are transient and ever changing as the business changes, but the underlying corporate 'data' is forever a 'fixed' asset, and needs to managed as such! - This comes from my Data Administration background.

Leaving the emotional realm and moving into the scientific, normalization of databases is but one of three conventional database theories. The three concepts are 1) Hierarchical (root and child segments), 2) Network (linked list), 3) Relational (relations and tuples). You can actually implement all three designs using any of the other two model types, as the essence of the model is not in it's inception, but how it presents the business rules in and around the manifest data. Relational theory is a mathematical model, and, as such, has mathematical proofs for each level or state of normalization at which the data resides. The higher the degree of normalization, the more mathematically correct (or efficient) state of the data. I concur with what Geoffrey Rommel wrote: "they are explicitly designed to avoid having to update two or more tables when you are updating only one piece of data, and to avoid the well-known insert, update, and delete anomalies" - end quote. Given, that this is to the advantage of OLTP, is not the same thing as saying NF models are only for OLTP, in fact Relational theory knows nothing of OLTP. What Relational Theory does suggest (and history has borne the truth of it), data contained in relational database designs, will have the most 'stable', most 'reliable', most 'accurate' and most 'flexible' form of existence.

You said: Let me tell you what my ulterior motive is - to help my clients be successful by designing and implementing the best solutions. I've been at this for 20 years, and I know how to design any kind of model. I do what works.

The difference is, I do what works 'best' for the problem at hand, not just want 'works'. As a consultant, I seek to give a client the 'best' solution for the stated problem. Finding the best business solution hinges on many factors. I always ask the customer some basic and key, fundamental, business questions to determine the lifetime expectancy of the organizations data, the degree of inter-departmental dependencies, the level of detail, and the breath of applications across the entire enterprise, commitment to 'one' version of the truth, and so on. The answers to these, as well as evaluating other functionality issues and requirements will lead me to the 'best' solution both for the immediate need, and for the long term as well. I try to give the client the "big picture" if you will, and not be short sighted for near term gain at the expense of future requirements and goals.

I will differ with you that snow-flake schemas are in 3NF. What you have are somewhat normalized relations of dimensions to (at best) 1st or 2nd NF fact tables. What you typically have are one or more transitive dependencies that do not depend entirely and completely on the true data keys.

On the 'religious' theme or tone, having faith has nothing to do with it. I believe Relational Models based on third (or higher) normal form implementations work, and work best, for the EDW core, based on evidential experience and empirical statistics. Your experiences are different than mine, so mileage may vary, but I've been involved with many (over 20) implementations of EDW databases that fall anywhere form 200 Gigabytes to 50 Terabytes, and I have never seen the 3NF design fail to meet the business needs or the system requirements when long-term stability, lifetime flexibility, and corporate data management were the driving 'critical success' factors.

You asked (quite rhetorically) "What is the correct business model?" to which you supply your own answer: "In a data warehouse, it's the one that supports the reporting and analysis" and I would say - "NOT!" and nonsense. Reporting and analysis are but only ONE facet of a multi-faceted problem. And to limit a Data Warehouse design to the constantly changing, forever evolving reporting requirements, is to limit the DW's ongoing functionality and usefulness, or forever doom the organization to an eternal re-design process (to the sole benefit of the consulting industry, I might add). You also end up with multiple stars that contain overlapping and redundant data. Which, ultimately lend confusion to the user and violate the 'single source' or 'one version of the truth' DW.

So, how to wrap all this up? I can't really put an end to the differing positions taken, I don't think anyone can or will. Each persons individual experiences are going to dictate their own mind-set for doing things. I have experienced great success seeing things done one way, you, obviously have seen success doing things another. For each solution there are advantages and dis-advantages, strengths and weaknesses, and any implementation ultimately goes to risk management, to the extent that the pluses will outweigh the minuses.

Peace,

Michael

Michael E. McBride
Teradata Database Administrator
Teradata Certified Master
American Eagle Outfitters



     
  <Prev Next>   <<First <Prev Next> Last>>  
 
 
 
 
 
 
 
 
  
  Top Home Privacy Feedback  
 
 
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 15 Jun 2023