Home Page for the TeradataForum

Archives of the TeradataForum

Message Posted: Fri, 08 Nov 2002 @ 21:08:46 GMT

  <Prev Next>   <<First <Prev Next> Last>>  

Subj:   Re: 3NF vs Dimensional modelling
From:   Neil Raden

You wrote:

  I Agree with you Michael. A 3NF model is the best for the core of a DW, in our project We have 2 models, a base model in 3NF with some denormalization coming from business necessities and a Dimensional model to implement a datamartdoing so we will have the ability to answer to the user necessities in a better way.  

With all due respect, it sounds like you avoid the religious war by insisting on your religion :-)

One very major problem with the 3NF is that it isn't very useful for satisfying queries directly, leading to downstream data marts, lots of views to build and maintain and lots of DBA work. Now granted, Teradata is a little more capable of resolving these queries than the merchant databases, but it is still an arduous task to traverse all those tables and joins to answer a simple question.

But there is a much bigger problem. How do people get access, in this environment, to detailed transactional data on a same-day basis? Lots of operational reporting requires this, and can't wait until tomorrow. The solution for the 3NF DW crowd was the ODS, which, coincidentally, sounds a little like odious. At least to me.

The ODS is an artifact of a design approach that has lost its currency, and was based on the limitations of technology 10 and even 15 years ago.

It was a reaction to shortcomings in what had come to be known as the Corporate Information Factory. The basic premise of the CIF is that data is PERIODICALLY dropped off at the back door of the data warehouse, checked in, scrubbed, "lightly aggregated" and inserted into an Enterprise Data Warehouse Schema. Because this schema is largely a 3NF design, it isn't suitable for direct query, therefore, more processing is needed to create more-summarized "data marts" that are cubes, non-conforming subject-area-limited star-schema, and a host of other "end user" tools, including even Access and Excel.

The problem is that all of this takes time. Some MOLAPs can take hours to load, and in many cases, too many hours to support daily reporting. Because no one can query the EDW directly, or at least, not easily or efficiently, everyone has to wait for the CIF to complete its periodic batch processing which may or may not take longer than a business day. This is where the ODS comes in.

The data marts of the CIF just don't have the detailed, intra-day information needed for the ongoing operational reporting requirements of organizations. The concept behind the ODS was to provide, within the data warehouse environment, a place for people to get at this kind of data quickly (there are arguments that the data warehosue has no business even being involved in this function, but the distinctions between analytical and operational reporting have grown largely incidental). For that reason, the ODS data was not scrubbed and integrated like the EDW data, and it wasn't modeled as carefully, either. In fact, in most cases, the ODS schema were just copies of the tables in the operational systems. Obviously, integrating this ODS data with other data in the data warehouse is out of the question, as little to no attempt is made to conform the three incompatible schema: ODS, EDW, multiple data marts. If you create an ODS with A/P postings, Payroll transactions and inventory movement, you can be sure that the ODS will not allow reporting across these subject areas. The ODS may look like a happy little box on the diagram, but inside is a lot of sloppy thinking and lack of integration.

Another problem is that the data in an ODS is typically wiped out with successive loads, and diagrams that show ODS data flowing to the EDW are misleading. To the extent that the EDW handles the same data as the ODS, it has to be handled twice (three times if you count the need to spawn data marts). All of this redundant processing takes time (including expensive development and maintenance time).

Now two things have happened to make the ODS superfluous. First, Ralph Kimball explained in detail in some articles four years ago in Intelligent Enterprise (it may have still been DBMS, I'm not sure) how to eliminate the need for an ODS entirely by building your dimensional models at the lowest level of detail. This detailed, CURRENT data is appended to the data warehouse in the same way as anything else (Even a few years ago, this would have been tricky, but current releases of the major RDBMS's have made this process almost seamless, too). I won't describe the whole process of rolling snapshots and so forth, it's all clearly explained in Ralph's books and articles, but the point is that the artificial and unnecessary distinction between the ODS and the data warehouse causes lots of extra work and expense, and contributes far less value.

We've built many data warehouses where we append detailed, "operational" data to a dimensional schema at intervals of 15 minutes or less. The instant the data hits the warehouse, it's available to everyone, and we have one, easy-to-understand schema covering intra-day, daily and historical data, and the process is incremental and very fast. We've started to move forward to support BAM/BPM applications with this technique and see it as the bedrock on which we'll help our clients move to real-time Business Intelligence applications in the next twelve months.

In my opinion, the ODS is a dead-end and will only lead to more work and more latency.

So I would judge the 3NF approach on what it can and can't support architecturally. In the case of intra-day reporting, it's dead in the water.


Neil Raden
Hired Brains, Inc.
Montecito, CA 93108

  <Prev Next>   <<First <Prev Next> Last>>  
  Top Home Privacy Feedback  
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 28 Jun 2020