Home Page for the TeradataForum
 

Archives of the TeradataForum

Message Posted: Tue, 12 Nov 2002 @ 13:08:19 GMT


     
  <Prev Next>   <<First <Prev Next> Last>>  


Subj:   Re: 3NF vs Dimensional modelling
 
From:   Robert Doss

Hi John,

Excuse me if I was coming across evangelical - I try to follow the teachings of St. Mick and Keith: 'sonly rockandroll, but I like it'.

I just like to toss these ideas around and trade experience with smart folks like you.

Speaking of which, I am a great admirer of your company's work on Teracube and thought it was a great pity that it was caught up in that dubious transaction between Microstrategy and NCR. I suppose the subsequent SECinvestigations etc. buried Teracube forever. It is probably for thebest, sometimes these kinds of set-backs force us to re-evaluate our work and come up with even better solutions (as it appears you have). In any case, in my opinion the work you describe is an example of the sort of 'out of the box' thinking I believe will lead the way to the next generation of analytical applications and in the process render these debates about ER vs Dimensional models a lot less emotional. I think Kimball is partly to blame for this with his strident, polarizing communication style of manifestos etc.

Let me clarify a few of my ideas and toss in a few of my observations from my experience with AS and Teradata that you might find interesting and that are slightly different to yours:

1. First of all, to clarify: I don't really have any problem with SQL either. I just think that it is not that great a language/platform for information analysis. I think the key position of MDX in the architecture you describe is an acknowledgement of that. However, MDX is never likely to be the general purpose language SQL is; for example it will never be a language to do transactions (with the exception of your company's localcube task thingy, but that's another subject); and for simple to moderately complex set based operations, SQL is great. However, I believe MDX or something like it that was designed specifically to do analysis is a much better solution. In my opinion, increasing the layers of analytical functions and UDFs on SQL makes it worse more than it makes it better at least as long as there is an alternative (i.e. MDX) - but if the developers put them in there I will use them when it makes sense.

2. MOLAP scalability: In terms of MOLAP builds, AS is about as good as it gets in the current versions of MOLAP technologies; which unfortunately is not saying too much. However, what AS does give you that is a bit unique is partitions and a primitive, but effective incremental add to those partitions. So, by partitioning by time and incrementally adding we can isolate smaller parts of the cube for update. With a bit of coding we can incrementally add to only the partitions with current data and do this in parallel. With a dual processor box it is practical to get as many a four of these update processes going at the same time if you manage the workload. Two in parallel needs little or no workload planning. So the partitions with historical data only need to be built once. By doing this we have created and maintain very large cubes with full history and detail practically to the transaction level. We do nightly updates in a short batch window - but there is no reason we couldn't move to almost continuous update. The overall cube sizes are surprisingly small - even for very sparse cubes and best of all: the query performance is outstanding.

3. Multiple Facts / cubes - Virtual cubes are another very nice AS feature. You state that a weakness of AS is 'the expectation that all measures are in the same cube'. Well I guess if the one cube is a virtual cube that is true. But with features like virtual cubes and MDX functions like Lookupcube() I have not found any limitation here - I have combined measures from many cubes (eight or more in some cases) into one virtual cube. Maybe I do not understand your point? This was a breakthrough MOLAP feature for me because it enabled the next point.

4. ER vs Dimensional - The Teradata installations I have worked on with AS have pretty good multi-subject area ER models. My bias is to try and keep it that way instead of copying data into star schemas. It is challenging, but possible in almost all cases to build views over the star schema to build 'virtual' dimensional models. This is one of the great things that the Teradata optimiser enables. However, even with the Teradata optimiser these views can be very complex and getting them to perform on cube builds is one of the most challenging aspects. I have a few years of experience with the Teradata optimizer and know a few tricks to get it to behave (although the ability to compile plans would save lots of time. a pet peeve). So far, I have not had to create the kinds of hybrid ER - Dimensional databases you describe. One other point with this architecture: AS drill-through scenarios are possible here too because we are working against the detailed transaction data in the ER model - although most of my work makes heavy use of calculated measures and drill-through is a bit problematic in this case. One last point on this - while I have avoided materializing a dimensional model - the ability to combine multiple cubes (stars) in one cube made me re-think my prior bias against dimensional. I do not believe it is so important to have an entity relationship model any more. There are still some high value multi-subject area integration issues that are difficult to solve in a collection of purely dimensional models - but I can see where the benefits of faster deployment time (not absolute, I know) might lead me to suggest a dimensional model architecture. Although, I still believe an ER model is a better starting point.

5. Application layer - I am really intrigued by what you have done with this. I suppose we should take this discussion off the list - it isn't all that Teradata specific. Anyway, I didn't know that AS let you step in between the Analytical engine and the Relational engine in ROLAP and re-write the SQL and hand the answer sets back to it. Is this all as seamless as it sounds? - I guess as long as you are handing AS what it expects in terms of an answer set (shapes, formats etc) it all works fine? One other question - you talk about Web Services - are you using the XML for Analysis SOAP interface? Or did you write you own Web Services layer and are you doing this with .Net or Java?

Thanks very much for the thought provoking post.

Happy trails,

Bob Doss
Spirit Lake Consulting Pty. Ltd.
Roseville, NSW
Australia 2069



     
  <Prev Next>   <<First <Prev Next> Last>>  
 
 
 
 
 
 
 
 
  
  Top Home Privacy Feedback  
 
 
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 15 Jun 2023