Home Page for the TeradataForum
 

Archives of the TeradataForum

Message Posted: Fri, 27 Jan 2006 @ 22:02:29 GMT


     
  <Prev Next>   <<First <Prev Next> Last>>  


Subj:   Re: Erwin is not mi amigo again...
 
From:   Anomy Anom

<-- Anonymously Posted: Friday, January 27, 2006 17:00 -->

Hence my assertion that MULTISET = DUCT TAPE. We all know what duct tape is used for.

Here are my observations:

PI's are great for data distribution but make lousy unique identifiers; However, the bigger the table the better it is to have a UNIQUE PI. Sounds contradictory but the savings can be realized by having queries utilize the PI; after all, table scans on big tables, even partitioned ones can be very costly. Best to educate folks on using a PI on the big ones all the time.

As a rule, I keep tables below 100 rows with the same PI. 2,000,000 row inserts are not a big deal and the data distribution makes up for it on queries.

USI's are the KISS OF DEATH. They make up for the lack of a PK at a very high price of dozens and even 100s times more cost to do processing. I don't use them at all. NO USIS or MULTISETS allowed in my Datawarehouse. Nein!

I started working with RDBMS since 1989 and have not had a need to have duplicate rows in any of my database designs table. I used INGRES, SYBASE, ORACLE, DB2, and now TERADATA. Multisets can get one into a pickle of data corruption. I have seen my fair share of databases w/o data models riddled with data corruptions and RI problems. No one has a blueprint to properly maintain the database schema, so they add and remove columns at will.

It's bad enough TD does not have the concept of a PK that does not cost a first born to use, so now enter Multisets as another tool in the arsenal to make the disaster happen faster and get one into the pickle of data integrity problems.

Most MULTISETS can be eliminated by using due diligence. I have a "user" with a 2B row table and an USI that took 2 hrs to update. I asked them to get rid of the USI but the duplicate check on the PI still made it last almost 1 hour. Their solution was to go MULTISET to avoid the dup check but what to do about RI????? I suggested a PPI and now it takes 15 minutes and many of their queries take advantage of dynamic partition elimination. A little outside the box thinking went a long way.

If you must use a MULTISET use them with plenty of caution.


Anonym



     
  <Prev Next>   <<First <Prev Next> Last>>  
 
 
 
 
 
 
 
 
  
  Top Home Privacy Feedback  
 
 
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 15 Jun 2023