Home Page for the TeradataForum
 

Archives of the TeradataForum

Message Posted: Mon, 11 Nov 2002 @ 19:11:30 GMT


     
  <Prev Next>   <<First <Prev Next> Last>>  


Subj:   Re: 3NF vs Dimensional modelling
 
From:   Neil Raden

Michael,

You wrote:

  First I want to be on record as also having 20+ years experience in the Information Systems industry, initially as a 'report' programmer, and, after earning my BS in CS for system design in 1988, moved into data architecture and database design, and, even though Enterprise (conceptual/logical) Data Modeling is 'platform' independent, I have been implementing NF databases on Teradata since 1989.  


I've already made an exception in the case of Teradata, and your comments would be much more compelling if your experience had been the opposite.


  ...some text deleted
On that note, I firmly believe that business 'applications' are transient and ever changing as the business changes, but the underlying corporate 'data' is forever a 'fixed' asset, and needs to managed as such! - This comes from my Data Administration background.
 


I've been doing this for 20 years, and one of the remarkable things is that the reporting and analysis has actually varied very little in that time frame. The data, on the other hand, is vastly different. ERP (and almost all process-level data), e-Business, Web analytics, just to name a few. Or quality metrics, service level agreements, a whole slew of customer-facing statistics - that data didn't exist 10 years ago. P&L's did, sales reporting, budgets and variance reporting did. So I think you're wrong, it's the reporting that is more durable than the data.

But even that doesn't address your contention that 3NF is is about the data and DM is about the reports. See below.


  ...some text deleted
Relational theory is a mathematical model, and, as such, has mathematical proofs for each level or state of normalization at which the data resides. The higher the degree of normalization, the more mathematically correct (or efficient) state of the data.
 


First of all, the mathematical "proofs" are theoretical, and only apply within the formal system that judges their truthfulness, which maked them suspect (see Goedel). They absolutely do not assert any degree of "correctness," you've made a clear pedantic argument here. I'm a mathematician, or as least was, so we can go to the mat on this one if you like. To use words like "correct" and "normative" is just confusing to people who aren't familiar with formal systems, it endows them with a sanctity they don't deserve.


  I concur with what Geoffrey Rommel wrote: "they are explicitly designed to avoid having to update two or more tables when you are updating only one piece of data, and to avoid the well-known insert, update, and delete anomalies" - end quote.  


This makes me wonder if you or Geoffrey ever really worked with a good DM. I never need to update two tables at one time. And as for the last comment, didn't I see an UPSERT in Teradata? Besdies, I never delete or modify records on my DM's. You should come over sometime and I'll show you how to do it.


  Given, that this is to the advantage of OLTP, is not the same thing as saying NF models are only for OLTP, in fact Relational theory knows nothing of OLTP.  


Totally incorrect. You need to review your history of RDB. RDBs were invented for REPORTING and ANALYSIS, not OLTP. But no one would buy them because they were so slow at OLTP. So Codd had to go back to the drawing board and invent the normal form for one and only one reason - to get those slugs to run a little better. The idea of using them for reporting got lost, and the TPC was born. The rest is history.


  What Relational Theory does suggest (and history has borne the truth of it), data contained in relational database designs, will have the most 'stable', most 'reliable', most 'accurate' and most 'flexible' form of existence.  


It doesn't prove that at all. It asserts it. Besides, DM is a relational database design. So I'm going to challenge you one more time: show me how a snowflaked schema is NOT in 3NF. And if you can't, and I don't believe you can, suggest some sort of other criteria that then distinguishs one type of 3NF from another, and how one can be more "correct" than the other.


  You said: Let me tell you what my ulterior motive is - to help my clients be successful by designing and implementing the best solutions.  


I've been at this for 20 years, and I know how to design any kind of model. I do what works.

The difference is, I do what works 'best' for the problem at hand, not just want 'works'.

You're maligning my practice here, lets not get personal.


  As a consultant, I seek to give a client the 'best' solution for the stated problem. Finding the best business solution hinges on many factors. I always ask the customer some basic and key, fundamental, business questions to determine the lifetime expectancy of the organizations data, the degree of inter-departmental dependencies, the level of detail, and the breath of applications across the entire enterprise, commitment to 'one' version of the truth, and so on. The answers to these, as well as evaluating other functionality issues and requirements will lead me to the 'best' solution both for the immediate need, and for the long term as well. I try to give the client the "big picture" if you will, and not be short sighted for near term gain at the expense of future requirements and goals.  


And what do you think I do? Ask to see their reports? What does this have to do with the points in this thread? You're making a purely rhetorical argument here, which is, "I do the right thing as a consultant, and the right thing is EDW/3NF, therefor EDW/3NF is the right thing."


  I will differ with you that snow-flake schemas are in 3NF. What you have are somewhat normalized relations of dimensions to (at best) 1st or 2nd NF fact tables. What you typically have are one or more transitive dependencies that do not depend entirely and completely on the true data keys.  


Just be specific. If there are formal proofs, lets see them.


  On the 'religious' theme or tone, having faith has nothing to do with it. I believe Relational Models based on third (or higher) normal form implementations work, and work best, for the EDW core, based on evidential experience and empirical statistics.  


Then make your case. Give me some evidential experience and emprirical statistics.


  Your experiences are different than mine, so mileage may vary, but I've been involved with many (over 20) implementations of EDW databases that fall anywhere form 200 Gigabytes to 50 Terabytes, and I have never seen the 3NF design fail to meet the business needs or the system requirements when long-term stability, lifetime flexibility, and corporate data management were the driving 'critical success' factors.  


  You asked (quite rhetorically) "What is the correct business model?" to which you supply your own answer: "In a data warehouse, it's the one that supports the reporting and analysis" and I would say - "NOT!" and nonsense. Reporting and analysis are but only ONE facet of a multi-faceted problem.  


What are the others? What is the client paying for?


  And to limit a Data Warehouse design to the constantly changing, forever evolving reporting requirements, is to limit the DW's ongoing functionality and usefulness, or forever doom the organization to an eternal re-design process (to the sole benefit of the consulting industry, I might add).  


You make the assumption that an analytical focus to the DW leads to these problems. You're confusing two issues. One, a very narrow one that we are discussing, is the data model. The other is how you architect and design a business intelligence environment for an organization. I don't think the latter is dependent on the former, and appealing to the latter doesn't justify one approach or the other.


  You also end up with multiple stars that contain overlapping and redundant data. Which, ultimately lend confusion to the user and violate the 'single source' or 'one version of the truth' DW.  


I don't, and no one who is applying best practices in DM does either.


  So, how to wrap all this up? I can't really put an end to the differing positions taken, I don't think anyone can or will. Each persons individual experiences are going to dictate their own mind-set for doing things.  


That's fine, just don't appeal to some higher authority to justify your point of view, unless you can drag it out and demonstrate it convincingly. Also, your understanding about how enterprise dimensional models are developed and implemented is not very thorough. Many of your criticisms are unfounded. Given that you've been working almost exclusively in a Teradata environment, that's not too surprising, but if I were you, I'd avail myself of some facts before making such sweeping claims.


  I have experienced great success seeing things done one way, you, obviously have seen success doing things another. For each solution there are advantages and dis-advantages, strengths and weaknesses, and any >implementation ultimately goes to risk management, to the extent that the pluses will outweigh the minuses.  


Maybe, but I can add that I have seen many massive failures using the EDW approach. That was the only way we knew how to do this 12-15 years ago. I've learned there is a better way (for most databases) and I've learn the value in slaying some sacred cows once in a while. Most relational database people have a very weak grasp of relational theory, but that doesn't stop them from citing it like scripture.


-NR

Neil Raden
Hired Brains, Inc.
www.hiredbrains.com



     
  <Prev Next>   <<First <Prev Next> Last>>  
 
 
 
 
 
 
 
 
  
  Top Home Privacy Feedback  
 
 
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 15 Jun 2023