Archives of the TeradataForum
Message Posted: Thu, 09 Oct 2003 @ 19:30:46 GMT
I'm having problems with OLAP functions again. I've written to the list before about this issue but never got anywhere. It seems that no one really knows what goes on inside Teradata when it comes to exactly how OLAP functions accomplish their tasks. Maybe a Teradata guru out there could provide some insight. (Hint, Hint.) Here is the problem:
OLAP FUNCTIONS USE WAY TOO MUCH SPOOL!
Case in point: I have 1 volatile table loaded with data. Let's call it V1. My current spool is 15 gig. I have nothing else running. Now I run a query on V1 that simply selects all data in it with the addition of 1 OLAP calculation, and then inserts the results into another table, call it V2.
Insert into V2
Here is the explain:
Guess what happens? Before the job blows on spool, I end up with about 90 gig of spool being used. 90! What the heck? There is no hot-amping going on here. All amps have roughly equal amounts of data. Why on earth does one OLAP function cause my spool to balloon to over 6 times the size of the original table? It makes no sense. It's like Teradata is internally making 4 or 5 copies of every row. This sounds a bit outrageous, but it may not be far from the truth. I did a further test. I rewrote the query like this:
Insert into V2
This method deals only with a small portion of each row when performing the OLAP function--it uses the primary_index and a couple of other fields. After the OLAP function completes, the result is put into a derived table. This is then joined back to the rest of the data in V1, and then inserted into V2. So it does the same thing as the first version of the query, except in a more round about way.
What I found is that the second version not only uses far less spool, but also runs much faster. The only real change to the query is that the OLAP function no longer operates on the entire row from V1, but rather just a subset (the keys and whatnot). This supports my hypothesis that Teradata is creating a lot of extra "work data".
It would be nice to know why OLAP functions behave as they do (with regard to spool usage) and whether or not any changes are coming down the pike that will reduce their space requirements.
|Copyright 2016 - All Rights Reserved|
|Last Modified: 28 Jun 2020|