Home Page for the TeradataForum
 

Archives of the TeradataForum

Message Posted: Mon, 11 Nov 2002 @ 20:58:29 GMT


     
  <Prev Next>   <<First <Prev Next> Last>>  


Subj:   Re: 3NF vs Dimensional modelling
 
From:   Mike Buckland

I love this topic. I tend to check this list about once every 2 weeks to catch up on recent discussions. However I received several messages to my NCR email to check it out (thanks to Jim, Joe, and Alan). I'm one of the better known heretics within NCR that has been a long time advocate of dimensional modeling on Teradata. My preference comes down to 3 reasons:

1- Query performance will almost always be better with a dimensional model vs. 3NF model. The exceptions that I've found tend to be in the really trivial queries. The indexed query that accesses a dimension table or 2 can run faster on a 3NF implementation, but not much else. The overhead of joining additional tables to get all of the dimensional data necessary can raise the execution time. No, performance isn't everything, unless you aren't getting the level that the business needs, then it's the only thing (apologies to Vince Lombardi). And it's a lot easier to get there on a DM. By the way, after some testing over the last few months I no longer totally accept the "joining between fact tables on a Star is horrid" argument that I once did. I would really like to see somebody else's numbers on this, but in my testing there's just not any difference.

1a- A related reason for preferring a dimensional model is that the execution times tend to be more predictable. An query that joins a total of 4 tables is much less likely to get "lost" and take a strange execution path than one that joins 8 tables. This is especially true in a situation when statistics haven't been collected recently. Even when statistics are current you're much more likely to get join plans with "high confidence" with a dimensional model and the inherent simplicity.

2- The overall model will be simpler. This simplicity gives 2 great benefits -- it's easier for an analyst or developer to understand and it's easier for a tool to generate SQL. Even when covered with a GUI tool there will be times when the model must be understood. New reports needed, the developer when built the original application is long gone, and errors found in a report that's been used for a long time are just 3 instances when a model must be revisited. A simpler dimensional model gives a leg up in a couple of ways: not is it simpler to learn, but there are just many more people out there who have done that kind of thing before. One place I disagree with Neil: "why should I care if takes 10 pages of SQL to satisfy a query, so long as it only takes a tenth of a second to generate". I think it's very important how complex the SQL is if I'm the DBA who's going to have figure out why the results are wrong, or the analyst who has is basing has job on understanding the report that's being generated.

3- The maintenance will be less. This is the one where I have changed my mind over the last year or so. One constant of the successful data warehouse is change. Finding a contractor who has written Cognos reports against a normalized database is hard. Most colleges now train people by the hundreds who understand the basics of how to navigate a dimensional model. Over the long term that is a not an insignificant expense.

Bottom line: Intelligently done, you can bring value to the business no matter what approach you take. As a platform Teradata allows a diversity of approaches, all of which have solved and can continue to solve business problems. Choosing the "correct" approach is difficult, choosing one that can be made to work is simple. However it's just easier to get there if you start with a dimensional model.

Mike Buckland



     
  <Prev Next>   <<First <Prev Next> Last>>  
 
 
 
 
 
 
 
 
  
  Top Home Privacy Feedback  
 
 
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 15 Jun 2023