Home Page for the TeradataForum
 

Archives of the TeradataForum

Message Posted: Wed, 15 Feb 2006 @ 10:45:50 GMT


     
  <Prev Next>   <<First <Prev Next> Last>>  


Subj:   Re: Question on Strategy
 
From:   Stover, Terry

The most expedient approach would be to delete one day, load one day (or week / month, whatever your load cycle is). Leave enough free space in the db that you have space for the largest day ever plus a safety margin, and put in a semi-manual space tracking process. Essentially tomorrow's delete would be based on today's load.

I've found that the law of large numbers applies, the average row size doesn't vary much once you have a reasoble amount of data loaded. You just need to balance the incoming and outgoing row counts. As Victor mentioned, JI's pretty much require an empirical approach, you'll have to load the data and create a model.

My recommended approach is to reassess the requirements. How much history, at what level of detail do the datamart users really need? Can you create views in your datamart db that reference the detail tables in the data warehouse, limiting the need for storing redundant detail data? Given the space constraints, should you be dropping some tables or columns from the design? You can sell dropping columns from the design as vertical partitioning. The less frequently used attributes are in a separate table to be loaded in phase 2, as space permits.

Other Considerations:

NUSI's are 8+ bytes per row (10 for a ppi table), so minimize their use.

Seriously consider using surrogate keys. You can't compress PI columns and there's not many dimensions that need more than the 2 billion values supported by 4 byte int.

On one space constrained system in the past (again another dbms) we created a customized calendar table using smallint (2 bytes) identity column instead of date (4 bytes). It can have a pretty dramatic impact especially if your fact tables have many date columns. The folks in charge of the data model won't like that approach though.

I'd probably go for aggregate tables rather than JI's. That would you to allow rolling data out of the detail but keeping it at the summary level. Since you won't have the detail to reroll the summaries you may be limited to summarizing by time.

Maximize use of compression and be aggressive with your data types. I've seen lots of source systems where the INT columns could have been SMALLINT or BYTEINT. I've generally seen 25-40% space savings.

Do some serious analysis on the source data. You will likely find that some columns are not reportable due to data quality issues or so highly skewed that they're minimally useful.



     
  <Prev Next>   <<First <Prev Next> Last>>  
 
 
 
 
 
 
 
 
  
  Top Home Privacy Feedback  
 
 
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 15 Jun 2023