Archives of the TeradataForum
Message Posted: Sun, 10 Sep 2002 @ 18:14:31 GMT
I am running some queries on a large table in our system and am noticing some seemingly bizarre spool usage. During query execution a large amount of data is going to just a few amps, creating a "hot amp" situation. Moreover, the total amount of spool being used ranges from a little over the size of the table being queried to three times that much (at different points during execution). I have narrowed down the problem to the OLAP functions I am using. Here is an example of a test query that produces the same spool problems as the original query.
sel A.* ,CSUM(1,A.COLUMN_1,A.COLUMN_2) from TABLE_1 A group by COLUMN_1;
Table "A" is not a view and is evenly distributed on column 1. Column 1 is the non-unique primary index. The logical key to the table is columns 1 and 2 (there is no corresponding index, but if the table were redistributed on these columns, the data would still be perfectly even).
Okay. Consider this: when I just select all data (no OLAP function) from the table, I get about 215 megs of data per amp (48 amps in all). When I add the OLAP function, data gets shuffled around such that at some points (during execution) the busiest amp will have 1,000+ megs while the least busy will have close to 18.
I don't get why data is being shuffled around at all. I am grouping by the primary index, so it would seem that each amp has all the rows it needs before the query even starts. Again, the table is already distributed on column 1 and that is the column on which I am basing the csum function. It would seem that all each amp would have to do is sort its own data and compute the csum value. Why data is being passed all over place, and to some places more than others, is beyond me. Also, even if for some strange reason the data was being re- distributed on the sort columns (which I would argue is unnecessary in this case), there should still be no skew. Yet there is.
I would appreciate any explanatory and/or helpful thoughts. Also, as a side question, can anyone tell me what it means for an explain to state "which is assumed to be redistributed by value to all AMPs".
Here is the explain:
|Copyright 2016 - All Rights Reserved|
|Last Modified: 28 Jun 2020|