|
|
Archives of the TeradataForum
Message Posted: Tue, 26 Mar 2002 @ 23:07:39 GMT
Subj: | | Re: Table partitioning |
|
From: | | Edmond Ray |
Guenter,
I helped develop just such an application, except we took it a little further. We were inserting 400 time based rows per second into a
table that was 550 million rows, 700 bytes each. The system was only a 4 node 5150 running Teradata V2R3. We did some testing and
determined that inserting into 400 'day tables' would increase insert rates as well as query rates as long as you queried the tables
individually. When we tried to UNION more than 2 tables even with only 1 million rows each, query times increase exponentially. We were
pretty sure it was due to cost of moving all of that data into spool at the same time. Even if the tables are read in parallel they still
have to be placed in spool to do the UNION. The approach we ended up using was to split the query in the application and issued a query for
each table in a different sessions, then we returned all the answer sets to the same output file to combine them. This was much more
efficient and returned sub second, first row responses for nearly all queries.
The other sticky issue is usually time based data uses 'BETWEEN time1 AND time2' type selection criteria in the WHERE clause. A BETWEEN
clause will always cause a full table scan unless you have a value ordered index on the column in the BETWEEN. You can only use a value
ordered index on a 4 byte or smaller numeric column such as an INTEGER, DATE, DECIMAL, BYTEINT, or SMALLINT. We found the most efficient
way to deal with this was a value ordered index on an integer column with Julian seconds since the tables were already divided by date.
This also required custom code in the application, but yielded the sub-second query response times. You could use a value ordered index on
DATE if you use month tables and then you probable would not need the custom coding in the application. Chances are you requirements are
not as tough as ours were, hope this helps.
Ed
| |