Home Page for the TeradataForum
 

Archives of the TeradataForum

Message Posted: Sat, 09 Nov 2002 @ 17:18:21 GMT


     
  <Prev Next>   <<First <Prev Next> Last>>  


Subj:   Re: 3NF vs Dimensional modelling
 
From:   Neil Raden

Michael McBride wrote:

  Well, I did not want to go down this path, but I feel forced into in, if only because I am a hard-core, diehard, 3NF implementation bigot from way back.  


If you admit to being bigoted, why not keep those opinions to yourself? Where I grew up, bigotry was considered something to avoid, not be proud of.


  I have read all the Kimbal idealogy that had, at its core, the one ulterior motive, to sell his own endorsed DBMS products.  


I think it's fair to make that claim, but I don't think it's valid. True, Kimball came out of Metaphor, where he learned about the star schema and ROLAP, then started Red Brick which, by the way, was one fine piece of technology. I have clients still using it, years after the company vanished inside Informix and then IBM. And Ralph also tied up with Sagent for a while, clearly a company in the DM camp.

But you should give the thousands of us out there who use this stuff a little credit - we (or at least many of us, including me) profit not at all. In fact, if I went the CIF route, I'd make a helluva lot more money in my consulting practice, because it would take millions to do what we do on hundreds of thousands.

And think this one over - don't you think there are influencial people who have been profitting for years selling their own and/or endorsed products by pitching a 3NF, CIF architecture? Start at the top of the list of DW luminaries and work your way down.

Let me tell you what my ulterior motive is - to help my clients be successful by designing and implementing the best solutions. I've been at this for 20 years, and I know how to design any kind of model. I do what works.


  First off, I have never been in favor of ODS, as a part of a EDW, but I am in favor of Active Data Warehousing, and the data WILL always get into the DW faster in 3NF than in the DM, that my friend, is a fact.  


Please show me a benchmark. And while you're at it, consider this: a DM with snowflaked dimensions IS IN THIRD NORMAL FORM!!! So when you start doing that 3NF vs DM thing, try to keep that in mind, because it's a non sequiter.

Also, "get into the DW faster" has to include the objects that people query. Also, the performance hit they take on each query because of views, lack of join indexes, etc.


  Point of fact, well designed 3NF models based on the true business subject matter will perform at the expected, reasonable performance criteria 90-95% >of the time.  


Totally wrong, misleading and unsubstantiated. I've been doing these benchmarks for years and I can assure you the opposite is true. Besides, rarely do people actually get to query these beasts, so what exactly do you mean "perform"?


  The key principle here is that the implemented model 'correctly' reflects the business (BQ, and BR) in all areas. I have seen horrible business models that have perfect 3NF implementation as applied to technique and theory. But the model did not reflect the business, this is almost always the case when I here statements like the ones you made such as: "One very major problem with the 3NF is that it isn't very useful for satisfying queries directly" and "Because no one can query the EDW directly, or at least, not easily or efficiently" and "So I would judge the 3NF approach on what it can and can't support architecturally. In the case of intra-day reporting, it's dead in the water."  


I don't even know where to start here. What is the correct business model? In a data warehouse, it's the one that supports the reporting and analysis needs. All of those circular relationships in a transactional are useful for capturing the atomic transactions of an operational system, but when you integrate data from multiple sources, those models are irrelevant. Dimensional models are the ones that reflect the business FROM THE ANALYSTS POINT OF VIEW. And the reason that no one can query these schema effectively is because they lack compound indexes and have too many tables. It isn't philosophy, it isn't rocket science, it just works.

The query optimizers in most databases simply can't handle analytical queries against a 3NF schema, they never could, that's we started using a DM in first place. Wouldn't it be a wonderful world if we could use one schema? And to make matters worse, all of the database vendors other than Teradata (Oracle, Microsoft, IBM UDB, IBM Informix) have been working hard for the last five to improve their optimizer technology FOR DM SCHEMA. So I don't expect to see any big improvement query performance against 3NF schema anytime soon in those products.

The 3NF model can't support intra-day reporting because it needs the ODS (or I should say, the CIF architecture does, of course you could insert real-time records into a 3NF design). The ODS stinks. It's a bad idea. Can you insert intra-day records into a 3NF schema? Sure, but you still can't query it, you have to move it out to data marts, and that's what's wrong with the whole idea.


  How open minded is that last statement? I would cetainly require verification of that one, since I have replaced many DM with 3NF implementations that out-performed the DM for all but the most basest of "canned" queries (for which the DM was tuned for). And I have improved data availability by hours in every case.  


My guess is you replaced some poorly-implemented ones, lord knows there are plenty of those out there. Plus, re-do's always involve new upgraded hardware, etc., so I'm not sure this an apples-to-apples observation. And data availability is a tricky issue. Just having data in a database doesn't make it available. Every 3NF EDW I've seen relied on views, aggregating and subsetting the data into data marts for queries, and only batch queries for the SAS rocket scientists in the basement were allowed against the main DB.

Again, I know Teradata resolves queries against OLTP schema more efficiently than other databases, so we can make an exception there.


  Bottom, line, the bad-rap that 3NF has gotten in the DW industry, is because too many people trained in a technique do not understand how to apply the technique to the art of business modeling. One can learn the techniques to paint oils on convas, but it takes talent and experience to be considered a master artist. (feel free to quote me on that one, I made it up on my very own a long time ago) Basically, the fault usually lies in the model itself and not the technique used. Granted, it is easier to mask a missing business rule in a huge fact table with tons of dimensions, but, as harsh as it sounds, I consider that just plain lazyiness.  


Your bigotry is manifest. Consider that there may be some artistry in DM modeling, too. It's also easier to mask all sorts of things in a design with 600 tables. Do you have any idea why Codd used the phrase "normalization?" If you don't, the answer is going to disturb you.


  So, more to the point, the EDW is only as good as the foundational model (or modeler), both in technique AND functionality, as the case may be.  


Actually, I think it's only as good as the customers think it is, and they want availability, completeness, understandabilty and performance. Im my practice, I find the overhead of a dead database too much to deal with.


  I will say that DM has a place and purpose (niche) in a Total Data Warehouse Solution, just not for storing the Corporate Data Assets at the core of an Enterprise Data Warehouse.  


  (I'm quite sure I'm closed minded on that thought though)  


  ...and the 'religious' wars go on...and on...and on...  


I don't love it, I find it frustrating. And I also don't like the characterization that I have some religious attachment to the idea. I don't. It just works. But when I talk to 3NF'rs, inevitably they appeal to some faith that this is the only true path and you gotta have faith.

Well, I don't. normalization was a technique to get relational data bases to perform as OLTP platforms, because they were too slow. When Codd devised what we now call normalization, he didn't have a word for it.

It was 1973. And you know what he said? He said, "Nixon is normalizing relations with China, so I guess that's what I'm doing, normalizing my relations."

So there you have it, the NORMAL form is an expression that was totally arbitrary. He could have called it Third Chickenfat Form.


-NR

Neil Raden
Hired Brains, Inc.
Montecito, CA 93108
www.hiredbrains.com



     
  <Prev Next>   <<First <Prev Next> Last>>  
 
 
 
 
 
 
 
 
  
  Top Home Privacy Feedback  
 
 
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 15 Jun 2023