Archives of the TeradataForum
Message Posted: Fri, 14 Dec 2001 @ 17:35:16 GMT
You didn't explain exactly what was going on when you added additional "days" to your history table. I am going to guess that your queries are performing full table scans, whether you have 1 or 30 days in your history table. This would (generally) produce a linear query response, double the size of the history table double the query response time.
It would also be nice to the relative percentage of customers to transactions or how many unique customers exist at any one time. I would also suggest that if you are storing date in the either the history or latest purchase table, you may want to replace it with a surrogate key. I think you could justify this from just from a disk storage saving and avoid some of the Index anomalies. (Then you would, of course, have to add a date to surrogate key table.)
The trick here is get in and out the "purchase history table" as fast as possible. That means we need to use as much constraining information to limit the size of the first pass through the "purchase history table". For example, if there exists 500,000 unique customers and the latest purchase table contains 200K unique customers at any one time. Any pass through your the 1B row history table will produce a result set of ~400M rows, 40% of the table. A full table scan is the most efficient approach. (Performance is poor).
Switch the PK of the history table to dateid and look for the most recent three days. If we 30 days exist in the history table, we are going to extract ~100M rows, 10% of the table. Again a full table scan is probably the most efficient approach.
Switch the PK of the history table to customerid and dateid. Place constraints on the smaller tables (using IN, nested queries, or derived tables. Any three-day query into the history table should result in a 4% result set (40M rows - 40% * 10%).
This is the key, we need to reduce the size of the result set from the first pass of the large history table. By reducing the result set size from 40% to 4% the optimizer can look for alternatives to the full table scan. The optimizer should look at and constrain the smaller tables first, perform a product join, then access the larger table via the multi-column PK. The performance here should very fast and consistent regardless the size of the history table. What we are looking for in the explain is a ROW HASH MATCH SCAN across ALL-AMPS.
Hope this long-winded explanation helps.
|Copyright 2016 - All Rights Reserved|
|Last Modified: 28 Jun 2020|