Home Page for the TeradataForum
 

Archives of the TeradataForum

Message Posted: Mon, 30 Jan 2006 @ 13:36:32 GMT


     
  <Prev Next>   <<First <Prev Next> Last>>  


Subj:   Re: Use of Multiset for Large Table
 
From:   Dunweber, Ole

Ulrich Arndt wrote:

  - but you need to assure that your process is not loading a data set twice by mistake (for example run the insert / select twice in an restart situation)!  


I totally agree.

But I would also like point out, that deliberately using "duplicates" can be very useful.

For example, it can be used to (very) quickly correct data-load-errors, or remove "1 instance" of a data set loaded twice by mistake:

Consider the following simple sales table example:

     CREATE MULTISET TABLE SALES_TABLE
          (
           ItemKey INTEGER NOT NULL,
           SalesDate DATE NOT NULL,
           StoreKey INTEGER NOT NULL,
           BatchNo SMALLINT NOT NULL, -- used for audit
           QtySold INTEGER NOT NULL
     PRIMARY INDEX ( ItemKey ,SalesDate )
     Primary key = (ItemId,SalesDate,StoreId)).

If you get bad data loaded, just run the bad data "negated" through your ETL-process:

"Negated" means:

     QtySold = - QtySold
     BatchNo = - BatchNo (for audit-reasons).

This way the data gets loaded into the base-table and any summary tables you may have on top of the sales_table.

And then you can insert the valid data whenever it arrives.

Note: this assumes, that any query on the table uses aggregations - that is when I want the QtySold, I must code:

     SELECT ItemKey, SUM(QtySold).

As we use Microstrategy this is an automatic option, I bet other BI-tools can easily do the same.

If you don't want the superfluous rows in the table, you can delete them at a later time.

So, my point of view is:

- MULTISET tables are very useful.

- The DBA(s) should not be in the power to disallow their use.

- Any Data warehouse with a well-designed ETL-process (and any great DW needs such a one), can easily and quickly correct errors due to erroneous duplicates in MULTISET tables.

- It's better to gain the great performance-benefits of using MULTISET tables each and every day, and live with a tiny risk of errors from time to time. As long as you can correct the errors quickly.


Best Regards,

Ole Dunweber
Coop Nordic



     
  <Prev Next>   <<First <Prev Next> Last>>  
 
 
 
 
 
 
 
 
  
  Top Home Privacy Feedback  
 
 
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 15 Jun 2023