|
|
Archives of the TeradataForum
Message Posted: Tue, 14 Feb 2006 @ 10:43:00 GMT
Subj: | | Re: Question on Strategy |
|
From: | | Victor Sokovin |
| We have a requirement to limit the database size by 500GB since the IT dept is able to allocate only this much space to our data mart. We
have to design an ETL process that calculates the size of the incremental load and the size of the existing load, the load should happen only when
the total size is < 500 GB , If the size exceeds by 500 GB then the ETL Process should be able to remove the old data so that it can accomodate
the data that is coming in. Is there any best approach for this? Are there any successful stories ? | |
In general, I would suggest an empiric approach. Try to identify the metrics of the data you could use to calculate the percentage the new data
is likely to be when loaded into the data mart. What kind of metrics? Depends on data distribution, of course, but you could just start with the
simple things like number of rows, average row size, etc.
The reason I suggest this percentage-based estimate is that the data mart might have dependent structures such as SI, JI, etc, the size of
which is not easy to predict exactly. It is a kind of academic exercise which is interesting in its own right but it is probably not practical to
implement it.
One important thing your metrics must be aware of is the skew. 500 GB can be filled 100% only if there is no skew. That's seldom the case with
real data, so you'll have to take into account the skew of the existing data and the skew of the to-be-loaded data. You could set up processes
which would calculate the metrics on the regular basis and store them somewhere in the metadata part of the mart. The processes could give you a
warning when the you are short on space on certain AMPs.
Regards,
Victor
| |