Re: Partitioning Start_Date End_Date pair for Dimensions [From Curley, David: Wed, 24 Jun 2009 @ 15:58 GMT]

Message Posted: Wed, 24 Jun 2009 @ 15:58:23 GMT

Subj:

Re: Partitioning Start_Date End_Date pair for Dimensions

From:

Curley, David

Srikanth,

Partitioning really only helps if queries will include a list or a range of values to compare to the partitioning field. In your case, partitioning your dimension table on start or end date will likely be detrimental: for any given query, there's no way of knowing ahead of time the range of start and/or end dates you'll need from the dimension table (with an exception below).

There are a few things you can do, though.

1) If most queries use only a few columns from the dimension table, a join index with the (or most-used) fact table might help (at the cost of storage and possible changes to your ETL).

2) Look at converting the dimension to a snowflake by vertically partitioning the table to separate slower- and faster-changing fields. If it currently has data like "first use date" and "last use date" put them in separate tables - the former never changes and the later can change every day. Queries that use only the slower data will see fewer rows.

3) If most queries you're supporting use only the most recent row (say, users are primarily interested in what happened yesterday), you might benefit by partitioning on the end date and making sure to include "and dim.end_dt = 9999/12/31" in queries. An alternative is to have a flag signaling the current row and partition on that. This might help if you have an operational application that needs results quickly and is always looking only at current data, but it won't help much when you get to weekly/monthly reporting and ad hoc analysis (by which time the most current row isn't necessarily the one as of week/month end).

4) If the PI for the fact table is rarely included in queries (i.e., it's mostly table scanned), you could try making the dimension ID the PI of the fact table. That might help if this dimension table is large and the PI distributes well, and if the fact table is already scanned most of the time, you won't be losing much on that end.

Dave

Attachments

Library

Quick Reference

Archives of the TeradataForum

Message Posted: Wed, 24 Jun 2009 @ 15:58:23 GMT