Archives of the TeradataForum

Message Posted: Fri, 09 Nov 2001 @ 20:16:58 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		Re: Data Model with Microstrategy as frontend tool

From:		John Grace

Krishna,

I understand some of you may not appreciate "vendor" comments submitted to this forum. So, I will try to temper my bias. The issue(s) raised throughout this thread are of particular interest to me and some of the suggestions, although perfectly valid, fail to explain exactly what Teradata is doing and why. (Also, I was a Teradata user long before I became a vendor.)

First, you may (or maybe not) be interested in my background. I am the president of Geppetto's Workshop, which was one the original firms that Teradata (the company) turned to to develop solutions for it's first customers. In 1996, NCR hired us to add OLAP capabilities to Teradata. We supplied technology (patents and software - The Ant Colony) to NCR, which eventually became TeraCube, NCR's OLAP Server for Teradata. One interesting aspect of The Ant Colony is the ability to generate SQL for a wide variety of data structures including 3NF, star, snowflake, * Our initial products were all designed and built to run in several hundred table, 3NF, Teradata data warehouses. We also were awarded the patent for a process we call dynamic "table selection" and "aggregate awareness", in which our software can select the appropriate tables based on the context and aggregation of the question. So we have some history here.

Second, I am surprised when I read comments from NCR representatives discussing the relative advantages and disadvantages of 3NF versus star or snowflake. I would agree that Teradata excels in environments that use 3NF designs, but if I were NCR I wouldn't concede anything in the non-3NF arena. In fact, most the original Teradata, decision support implementations (late eighties - early nineties) were snowflakes on steroids (100 to 1000 tables, multiple facts, many aggregates), although, at the time, we didn't know what snowflake schemas or data warehouses were.

The problem Teradata can have with 3NF or stars or snowflakes, is that bad SQL can lead to performance problems. This problem is more acute with Teradata than other RDBMSs, because Teradata offers fewer alternatives for overcoming bad SQL. Anyone with knowledge of the CREATE INDEX command and enough disk space can optimize an Oracle database, just keep adding indexes and hope for the best. Well that strategy doesn't work on Teradata and the reason has to do with parallelism and how Teradata stores and accesses data. Teradata can't join on secondary indexes, Teradata will use primary index to join tables (good), or Teradata will scan the table (bad). Teradata doesn't really have a physical primary index (now this is important), the primary index is the ORDER in which the table is SORTED, then DIVIDED and finally WRITTEN across the AMPs. Secondary indexes are, kind of, disasters on Teradata, because they tend bounce around from AMP to AMP, so this makes use of secondary index inappropriate for joining.

3NF designs typically have more tables with fewer columns, odds are better the join columns are going to match the primary indexes. These tables are well distributed across the AMPs and accessed in parallel (each AMP performs an equal amount of work). You get great performance! De-normalized structures tend to have fewer tables and more columns, hence more possible join columns. The joins columns no longer match the primary index. The optimizer tends perform more full table scans. Since Teradata can't join on secondary indexes, the technique of piling on secondary indexes, fails. I worked with the Teradata optimizer team to resolve this problem and NCR has introduced new types of Indexes (Star Indexes and such) to offer DBA's more options. But, even today, use the primary index to perform table joins; coupled with the correct SQL on a Teradata is an absolutely untouchable combination for 3NF, star, or snowflake designs.

A simple suggestion to improve star and snowflake performance, think about simplifying the primary index on your large fact tables to more realistically match real-world querying of the table. You need to balance the desire for perfect data distribution with to desire to improve query performance.

Third, another problem we frequently see with our Teradata customers is the heavy use of Views. Use of Views as a "solution" is referred to, throughout this thread, to simplify the complex SQL generating tasks of query and OLAP tools that have SQL generating limitations.

I know you are asking, "What's my problem with Views?" Well, pick a view from your database, with say, 10 tables. Write a simple SQL query to the view and look at the EXPLAIN. All the tables from the view are included EXPLAIN. Teradata is doing significantly more work than is required to answer your query. (Teradata is not unique here every database does this.) "Optimizing" techniques on 3NF data models tend to yield an abundance of complex views that may only run efficiently for a narrow range of queries. De-normalized designs have the same problem. Combine a large fact table with 5 or 10 dimension tables in a View, each time the View accessed all tables are included. A simple query to the View may yield worse performance, because the View may include large, unconstrained dimension tables! A more complex query to this View may lead to better performance, (Interesting?).

I what to make this clear, I don't have a problem with Views. Views have an important role in RDBMSs, but Views, materialized or otherwise, were not intend to replace good SQL or an efficient optimizer. Each SQL query should be written to include the minimum number of tables required to answer the question.

OK, what's the bottom-line? Teradata doesn't care whether you choose a 3NF versus some other de-normalized strategy, so let's hear no more discussion of this topic. The selection of the primary index for access versus distribution has much more significant influence on the performance your Teradata than the degree with which you de-normalization your data. Also, minimizing the amount of work required for any given query can significantly increase query throughput (use good SQL, not monolithic views).

John Grace
President/CEO
Geppetto's Workshop L.L.C.


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Attachments

Library

Quick Reference