Archives of the TeradataForum
Message Posted: Tue, 06 Apr 2004 @ 21:22:06 GMT
I am testing the possibility of replacing a NUSI on my largest fact table with a join index defined on the same columns. We currently rely heavily on the NUSI, defined on geography and time ( LocID, PeriodID ), to provide indexed access into the base table through the small table - large table join strategies within Teradata. Our problem has always been that the NUSI-based acces was rejected by the optimzer with large cardinalities of the small table product joins. We have proven to ourselves, by "faking" out the optimzer with nonsensical where clauses, that indexed based access is indeed faster for combinations of time and geography beyond that for which the optimizer will plan to use the NUSI. It is my intention to discover if the join index will be used with these larger spans of time and geography. My theory is that the join index based access plan, involving an all-amps redistribution of the product join result between the two (relatively) small tables, would be less costly at higher levels of time and geography than the all amps duplication step required by the NUSI based approach.
Consider a typical SELECT from our fact table ( which has billions of rows):
SELECT TFB.* FROM TransactionFactBase TFB JOIN LocationVerBase L ON TFB.LocID = L.LocID JOIN TimePeriodBase TP ON TFB.PeriodID = TP.PeriodID WHERE L.AreaID = 5 AND TP.MonthID = 34;
Our "small" tables, LocationVerBase (UPI on LocID, 40,000 rows) and TimePeriodBase (UPI on PeriodID(integerdate), 1,000s of rows) are typically filtered on DistrictID (1 District ~ 80 locations) or AreaID ( 1 Area ~ 1,000s of locations) and WeekID (7 rows) or MonthID (~30 rows). One sample product join at an Area, Month level can produce 5,000 (locations) * 30 (days) * 760 ( amps) = 114,000,000 location, period rows ( after duplication on all amps). Of course, with great spans of time and geography, we get too many rows and the optimzer will not use the secondary index. I have tested on samller dev datasets a join index, defined as SELECT (LocID, PeriodID), (ROWID) FROM TransactionFactBase PRIMARY INDEX ( LocID, PeriodID ), instead of the NUSI. My preliminary findings are that the optimizer will product join the small tables, redistributing the result to join to the join index thereby obtaining the ROWIDs for the selected rows, for higher spans of time and geography than the NUSI case.
Now comes my real question: Has anyone actually implemented this approach in a real-world large table(s) scenario? Does this strategy actually work? Are there any drawbacks? I am aware of the added JI (non-local) maintenance overhead. Any discussion appreciated.
|Copyright 2016 - All Rights Reserved|
|Last Modified: 28 Jun 2020|