Home Page for the TeradataForum
 

Archives of the TeradataForum

Message Posted: Tue, 14 Feb 2006 @ 10:43:00 GMT


     
  <Prev Next>   <<First <Prev Next> Last>>  


Subj:   Re: Question on Strategy
 
From:   Victor Sokovin

  We have a requirement to limit the database size by 500GB since the IT dept is able to allocate only this much space to our data mart. We have to design an ETL process that calculates the size of the incremental load and the size of the existing load, the load should happen only when the total size is < 500 GB , If the size exceeds by 500 GB then the ETL Process should be able to remove the old data so that it can accomodate the data that is coming in. Is there any best approach for this? Are there any successful stories ?  


In general, I would suggest an empiric approach. Try to identify the metrics of the data you could use to calculate the percentage the new data is likely to be when loaded into the data mart. What kind of metrics? Depends on data distribution, of course, but you could just start with the simple things like number of rows, average row size, etc.

The reason I suggest this percentage-based estimate is that the data mart might have dependent structures such as SI, JI, etc, the size of which is not easy to predict exactly. It is a kind of academic exercise which is interesting in its own right but it is probably not practical to implement it.

One important thing your metrics must be aware of is the skew. 500 GB can be filled 100% only if there is no skew. That's seldom the case with real data, so you'll have to take into account the skew of the existing data and the skew of the to-be-loaded data. You could set up processes which would calculate the metrics on the regular basis and store them somewhere in the metadata part of the mart. The processes could give you a warning when the you are short on space on certain AMPs.


Regards,

Victor



     
  <Prev Next>   <<First <Prev Next> Last>>  
 
 
 
 
 
 
 
 
  
  Top Home Privacy Feedback  
 
 
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 15 Jun 2023