Archives of the TeradataForum
Message Posted: Fri, 14 Mar 2008 @ 10:15:34 GMT
I agree it is excruciatingly slow, in a recent project that I worked on the customer insisted on using SP's and we were faced with an estimated 36+ hour run. Fortunately we were able to identify some special cases where 95% of the data volume could be handled using insert selects and/or updates. The remaining 5% had to be handled via a cursor. The result was that the run time came down to about 30-40 minutes. I don't remember the breakdown, but the majority of that time was in the processing of the 5% via the cursor.
I'm not sure that another type of table is the best approach, you mentioned using a volatile table - which is already pretty fast because it doesn't do have the overheads of a "real" table.
The issue is more one of row at a time. I think Dieter mentioned this as well. Row at a time is slow. In general terms, for every row to be processed, there is:
* a request which must be sent to the DB.
* parsing of the request.
* checking of permissions.
* processing of the request (i.e. doing the insert, delete or whatever)
* transmission of a response (e.g. success) back to the client.
This whole cycle is repeated for each row.
In Java (and other languages) there are some additional go fast options:
1) Prepared Statements - this gives TD to cache queries and bypass the parsing & checking steps. I'm sure that you can use this in a procedure with the "Dynamic request input host variables" question mark operator and Using modifier ("Open embedded SQL form").
2) Multi-threading. The problem with an SP is that it is sequential i.e. one after the other. The problem with this is that 1 disk (Maybe a couple if mirroring, fallback, raid etc) on 1-AMP on 1 node is servicing the query. All of the others are essentially doing nothing (OK, maybe doing other stuff for other people). The point is that you are only using a small amount of the system resource and even then only briefly, a lot of time is spent communicating - even if it is just over the internal BYNet. With threading (and multiple sessions) you can achieve much higher throughput 100's or even 1000's of queries per second, simply by getting more of the system servicing your application.
3) Batching. Batching is about sending multiple rows at a time per request. It is sort of like multi-statement requests, but without the statements. A Batch is a group of values (data parcels) supplied to a previously prepared query (as outlined in 1 above). Essentially it is like saying here is a query and here are the 100 sets of values (rows) to run it with. This leverages the parallelism of Teradata in a single request. Combined with multi-threading/multi-session you can achieve very high throughput rates.
What I don't know is whether the ANSI syntax for SP's allows the constructs I mention above (well #1 is already permitted in TD SP's). If it does, these are the things I would like to see. Because right now, I can build a Java process that far outstrips the performance of the equivalent SP. Imagine if we could use these techniques inside TD and eliminate the Network communications overhead!
Onya Dieter (-> Ozzy slang for: Well done Dieter, keep up the good work!)
|Copyright 2016 - All Rights Reserved|
|Last Modified: 28 Jun 2020|