Archives of the TeradataForum

Message Posted: Fri, 28 Aug 2009 @ 19:22:50 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		Re: Easy way to get the latest updated row in a query

From:		Victor Sokovin

If MAX(timeStampKey1) and MAX(sequenceNbrKey1) can be on different rows, one of the QUALIFY clauses could fail when the other passed, so we're still out of luck unless we're allowed to return more than one row. I think they'd have to fish or cut bait on requirements and pick one of

1) get the most recent row and more than one if timestamps are the same;

2) get one row based on the most recent rows plus a tie-breaker (sequence, some amount, or arbitrary); or

3) get the highest sequence (and assume - or hope - that it's also the most recent row).

Dave, my suggestion to use two QUALIFYs was only meant to better match the original query. Where the two MAX conditions in it would fail to return any rows, the same mismatch would occur for the two QUALIFYs. Whether that query by itself was describing the best possible strategy for the table I don't know, of course.

Whichever, an OLAP query is very likely the way to do it. And if multiple rows are allowed, didn't we determine a while ago that for the most part, using

value = MAX(value) over (partition by X)

tends to be faster than

RANK() over (partition by X order by value) = 1

Since it doesn't have to sort?

I agree that MAX should be more economical, although I seem to recall that conditions like RANK() = 1 are meant to tell the optimizer to simplify the sorting process but it is difficult to check whether this is indeed happening.

As far whether or not OLAP is the best approach, what if you had an AJI that had the MAX timestamps for each key? Would it be able to use that to pick out the correct rows faster than the OLAP scan? (Let's punt on the one-row/timestamp+sequence/etc issue for this.)

I think this would be a faster approach but instead of JI we could even build proper tables where all these ranking issues are addressed once and for all, i.e., the latest timestamps and seq. numbers are sorted out, exceptions are properly analyzed and reported etc. It all depends on who and how often needs to run such queries. If it is the end users who need them frequently then it would be unfair to make them sort out the same problems again and again. Build the "corrected" tables in the presentation layer and just let them decide whether to use the original tables or the corrected ones. Using the corrected tables would probably mean more consistent results, though. As we see even in this short discussion, we were able to come up with three-four slightly different methods which may potentially return different results.

Victor


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Attachments

Library

Quick Reference