Archives of the TeradataForum
Message Posted: Sun, 10 Nov 2002 @ 05:10:34 GMT
Hi all. Please let me toss another log on the fire.
In my opinion, the limitations of SQL are where this debate originates.
Let me start off with a quote from Erik Thomsen:
"It is normal to want to be able to phrase analytical queries in a natural or near-natural way. If what you want to see are product totals and subtotals or this year/last year comparisons or moving 52 week averages or top-selling products or changes in market share between this year and last year or percent changes in margins, that's all you should have to ask for. Spreadsheets let you phrase analytical queries in a near-natural way but only for small amounts of data that can be organized in a two-diensional grid. SQL lets you phrase simple queriies based on columns of data in a near-natural way as long as they do not involve any analysis or interrrow comparisons. Neither approach lets you easily phrase analytical queries against large amounts of complexly organized data."
In my opinion, the dimensionsonal models became popular because of the inability of SQL to easily express 'analytical queries in a natural or near- natural way'. By focusing on the analytical problem as a starting point the dimensional model is often very succussful. However, the dimensional model runs into the limitations of SQL as well - let me throw another quote to back this up - this time from Ralph Kimball - regarding drill- across - the combination of multiple fact tables in one analysis - e.g. orders, shipments, deliveries, inventory, returns ...:
"...If each of the numerical columns in this report is a separate query, then the report is built from a simple merge-sort process on the row headers of each of the four answer sets. A relational DBMS calls this process outer join. This process is so basic that we give it a design principle: A query tool fetching multiple answer sets and combining them must always combine these answer sets using outer joins. The outer joins are almost all two-way (symmetric) outer joins."
Kimball calls this process simple - but it often times is not, especially when the answer sets aggregate to different levels, or one or more of the result columns is semi-additive (distinct counts etc..) or if the result sets are quite large. In any case, post processing the results of a multiple full outer join defeats the goal of phrasing an 'analytical query in a near natural way'.
So if the problem is really SQL then neither Dimensional nor Entity Relationship models offer an ideal solution. Either approach often leads to compromises and less than elegant architectures in order to provide the necessary analytical capability.
We need our data management product software engineers and gurus to move us beyond SQL - and I don't necessarily mean new aggregate or user defined functions in SQL. In my opinion these do not get us significantly closer to the 'ability to phrase analytical queries in a natural or near-natural way'.
I am seeing some very interesting products (available and emerging) that begin to move us beyond the limitations of SQL (and debates about Dimensional v. Entity Relationship) and let us concentrate more on business analysis.
|Copyright 2016 - All Rights Reserved|
|Last Modified: 28 Jun 2020|