Home Page for the TeradataForum
 

Archives of the TeradataForum

Message Posted: Fri, 28 Aug 2009 @ 16:02:02 GMT


     
  <Prev Next>   <<First <Prev Next> Last>>  


Subj:   Re: Easy way to get the latest updated row in a query
 
From:   Curley, David

Victor wrote

  In general, we don't know how column-level maxima are distributed across the rows in partitions. It can happen that MAX(timeStampKey1) and MAX(sequenceNbrKey1) can sit on different ones and the JOIN in the OP's query will return no rows. A more careful rewrite of that query in OLAP terms should probably include two QUALIFY clauses. Agreed?  


If MAX(timeStampKey1) and MAX(sequenceNbrKey1) can be on different rows, one of the QUALIFY clauses could fail when the other passed, so we're still out of luck unless we're allowed to return more than one row. I think they'd have to fish or cut bait on requirements and pick one of

1) get the most recent row and more than one if timestamps are the same;

2) get one row based on the most recent rows plus a tie-breaker (sequence, some amount, or arbitrary); or

3) get the highest sequence (and assume - or hope - that it's also the most recent row).


Whichever, an OLAP query is very likely the way to do it. And if multiple rows are allowed, didn't we determine a while ago that for the most part, using

     value = MAX(value) over (partition by X)

tends to be faster than

     RANK() over (partition by X order by value) = 1

Since it doesn't have to sort?

As far whether or not OLAP is the best approach, what if you had an AJI that had the MAX timestamps for each key? Would it be able to use that to pick out the correct rows faster than the OLAP scan? (Let's punt on the one-row/timestamp+sequence/etc issue for this.)


Dave



     
  <Prev Next>   <<First <Prev Next> Last>>  
 
 
 
 
 
 
 
 
  
  Top Home Privacy Feedback  
 
 
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 15 Jun 2023