|
|
Archives of the TeradataForum
Message Posted: Mon, 05 Mar 2007 @ 20:02:43 GMT
Subj: | | Re: Performance of a self join |
|
From: | | Curley, David |
I wouldn't call that a classic self join - you're summarizing A and then joining to that summary (I'm assuming that A is also in the select
list, not just the derived table B). As with any join, if it's not on the PI, it's going to require redistribution or duplication somewhere along
the line.
In your example, you mightn't need the derived table at all. Since all it's doing is summarizing by dim1, dim2, dim3, you can probably push
all that into the OLAP function. You don't say how you're calculating "Value 2", but something like this might work:
Select A.*,
A.value1/ (sum(A.metric1) over (partition by A.dim1, A.dim2, A.dim3)
/
sum(A.metric2) over (partition by A.dim1, A.dim2, A.dim3))
>From A
That should be faster, but it won't be _that_ fast unless A has a PI of (dim1, dim2, dim3). If it doesn't, it'll have to redistribute
the table in order to get all rows in each partition on the same AMP. (I'd expect that Teradata would recognize if an OLAP partition clause
matches the PI and so not attempt to redistribute, but I don't know if that's true - can anybody else confirm that?) It also may or may not work
if you add additional criteria to the where clause, it depends on the population of rows you want to use to calculate the denominator.
If this is for a tactical application or will be run frequently, you might want to look at using a join index to physicalize all or part of the
work the query has to do.
Dave
| |