|
|
Archives of the TeradataForum
Message Posted: Fri, 28 Aug 2009 @ 19:22:50 GMT
Subj: | | Re: Easy way to get the latest updated row in a query |
|
From: | | Victor Sokovin |
| If MAX(timeStampKey1) and MAX(sequenceNbrKey1) can be on different rows, one of the QUALIFY clauses could fail when the other passed, so
we're still out of luck unless we're allowed to return more than one row. I think they'd have to fish or cut bait on requirements and pick one
of | |
| 1) get the most recent row and more than one if timestamps are the same; | |
| 2) get one row based on the most recent rows plus a tie-breaker (sequence, some amount, or arbitrary); or | |
| 3) get the highest sequence (and assume - or hope - that it's also the most recent row). | |
Dave, my suggestion to use two QUALIFYs was only meant to better match the original query. Where the two MAX conditions in it would fail to
return any rows, the same mismatch would occur for the two QUALIFYs. Whether that query by itself was describing the best possible strategy for
the table I don't know, of course.
| Whichever, an OLAP query is very likely the way to do it. And if multiple rows are allowed, didn't we determine a while ago that for
the most part, using | |
| value = MAX(value) over (partition by X) | |
| RANK() over (partition by X order by value) = 1 | |
| Since it doesn't have to sort? | |
I agree that MAX should be more economical, although I seem to recall that conditions like RANK() = 1 are meant to tell the optimizer to
simplify the sorting process but it is difficult to check whether this is indeed happening.
| As far whether or not OLAP is the best approach, what if you had an AJI that had the MAX timestamps for each key? Would it be able
to use that to pick out the correct rows faster than the OLAP scan? (Let's punt on the one-row/timestamp+sequence/etc issue for this.) | |
I think this would be a faster approach but instead of JI we could even build proper tables where all these ranking issues are addressed once
and for all, i.e., the latest timestamps and seq. numbers are sorted out, exceptions are properly analyzed and reported etc. It all depends on who
and how often needs to run such queries. If it is the end users who need them frequently then it would be unfair to make them sort out the same
problems again and again. Build the "corrected" tables in the presentation layer and just let them decide whether to use the original tables or
the corrected ones. Using the corrected tables would probably mean more consistent results, though. As we see even in this short discussion, we
were able to come up with three-four slightly different methods which may potentially return different results.
Victor
| |