|
|
Archives of the TeradataForum
Message Posted: Tue, 31 Jan 2006 @ 12:48:33 GMT
Subj: | | Re: Use of Multiset for Large Table |
|
From: | | Dunweber, Ole |
Hello again,
Entertaining subject, one final reply from me:
Anomy replied to me:
| <SNIP>How many business questions can you answer (using the PI), if you have put say 10 columns into it, to make it
unique?<SNIP> | |
| In 17 years of designing and working with Relational databases I never encountered a "natural identifier" that was 10 columns long. 5,
maybe 6 at most and in some those instances a surrogate key was used for inheritance purposes (FK). The particular table in question with 20B
rows has 3 columns as the UPI, all 3 columns are FKs to dimensions. In a properly design Dimensional Model, it is very easy to get the dimensions
that make up the PI of the large table into the SQL picture and help the OPTIMIZER put 2+2 together. The key is to ensure people know the data
model, know how to properly use the tables, know the nuances that Teradata's PIs play in access to the information, and know the nuances that
adding context in the form of tables supplying FKs to the PI definition can help the optimizer make the connection to use the PI (whether UPI or
NUPI, makes no difference). | |
How about business questions only specifying 2 of the dimensions in the UPI? Are they not sufficiently educated?
| If the table has 20B rows (1Tb) I have to do something to speed up inserts while preserving the integrity of the data. A PPI and a
USI make it easy to maintain without sacrificing access and without the use of a multiset definition. | |
This seems to contradict what you write later on:
| USI's are the Kiss of Death for performance. | |
Regarding PPI speeding up inserts:
Yes, but only if the partitioning matches the arrival pattern of the data.
I doubt you can accomplish this on all your big tables - at least not without putting say the date-column redundantly into some of the tables,
where it does not belong 3NF-wise.
Also this advantage lessons with the number of NUSI's and Join indexes on the table.
Using PPI is a fundamental change to the physical design, which must be carefully analyzed with the entire workload.
| <SNIP>Do you really consider the Data warehouse to be yours? In my humble opinion, it belongs to the
business.<SNIP> | |
| I must explain that I am not a DBA (although I play one here). While the hardware and Software belong to the business, I am charged by the
business to be the STEWARD of the DATA WAREHOUSE. As the Steward, I am responsible for its well being and the "owners" expect that. MULTISETS
are not in the best interest of the data warehouse or the business for two reasons: They violate RELATIONAL theory and they have the potential to
introduce serious data integrity problems. The first one is reason enough to keep them out of the data model, the second one violates the premise
of the DW being a single version of the truth (if duplicate rows yield improper results). Teradata already makes it difficult to implement
reasonable Referential Integrity because it requires USI's and USI's are the Kiss of Death for performance. No need to aggravate the situation by
adding Multisets to the mix. Let's remember that a PI is not a PK and DUPLICATE Primary Identifiers are indeed possible under Teradata's
NUPIs. | |
| Anyone with some relational database background that takes the 2 week class in Atlanta can be a Teradata DBA, but not everyone is able to
bring more value to the organization by doing the role of a Data Architect and Data Administrator. In that role I make sure the data warehouse
has a well designed, full blown data model that faithfully represents the business and is implemented in the Teradata box in a consistent state
(RI maintained). I spend a good deal of my time evangelizing the proper use of its contents by promoting the concepts stored in the data
warehouse and ensuring they are used properly. So yes, I sort of "own" the data warehouse because I act as the gatekeeper charged to ensure that
whatever needs to get into the data warehouse gets properly modeled and defined. Nothing goes in or out unless it is modeled first (I do the
modeling, so you get the idea). | |
Yes, you probably bring more value to the business regarding integrity.
But you bring less value to the business regarding performance, time-to-market of new applications, and the flexibility of business questions
which can be answered (with reasonable performance).
Is the business aware of these pros and cons of your strict stewardship?
And I totally agree with Victor Sokovin:
| "The argument that multiset tables introduce an additional risk is irrelevant provided you delegate the responsibility to the ETL and
QA" | |
Regards,
Ole Dunweber
Coop Nordic
| |