Home Page for the TeradataForum
 

Archives of the TeradataForum

Message Posted: Mon, 23 Jul 2007 @ 15:33:09 GMT


     
  <Prev Next>   <<First <Prev Next> Last>>  


Subj:   Re: MAX vs rank-qualify
 
From:   Dieter Noeth

Anomy.Anom wrote:

  The intention of the query is to find the latest record from a history table. One uses rank and qualify (Query A) and the other max function (Query B) .  


  The difference I found in the EXPLAINS is that the query using Rank and Qualify scans the table only once, whereas with the MAX- method, the table is scanned twice. Given that my table is bulky (over 100 million Rows), should query A be the automatic choice? I remember having read that its 'better' to avoid OLAP functions.  


Better than what?

As soon as you have to access the same table more than once, the OLAP version will probably be more efficient, especially if you're using complex views.

I once run some tests and i almost any case OLAP used a lot less CPU, but typically more Spool (Because of the two spools within that STAT FUNCTION step).

And OLAP functions always need the same resources regardless of the date, i.e. lots or few rows per value, whereas aggregates might vary. And because there's a join involved, the plan might change with different table sizes and rows per value.

And it's easier to write...

And EXPLAIN is shorter...


          > /*QUERY A*/
          >
          > select
          >  AAG.*
          >>from
          >  ddedwmig_vbdw.v_ACCOUNT_ACCOUNT_GROUP AAG
          >  CROSS JOIN CPMS_CURR_PERIOD CCP
          > where
          >  Source_System_Id = '10300' and
          >  ACCOUNT_ASSOC_START_DT  between CCP.START_DT and CCP.END_DT
          >
          > qualify
          >  row_number() over (partition by account_num, account_modifier_num,
          > selection_criterion_id
          >  order by account_num ,account_modifier_num, selection_criterion_id,
          > account_assoc_start_dt desc ) = 1
          > ;

You can get rid of the "order by account_num ,account_modifier_num, selection_criterion_id", because this is already the partitioning. And i don't know if the optimizer is smart enough to recognize that, if not you'll need more spool to order by 4 rows instead of 1.

"order by account_assoc_start_dt desc"


          > /*QUERY B */
          > sel * from ddedwmig_vbdw.v_ACCOUNT_ACCOUNT_GROUP
          > where
          > (account_num, account_modifier_num, selection_criterion_id,
          > ACCOUNT_ASSOC_START_DT) in
          > (
          > sel account_num, account_modifier_num, selection_criterion_id,
          > max(ACCOUNT_ASSOC_START_DT) from
          >  ddedwmig_vbdw.v_ACCOUNT_ACCOUNT_GROUP CROSS JOIN CPMS_CURR_PERIOD CCP

          > where
          >  Source_System_Id = '10300' and
          >  ACCOUNT_ASSOC_START_DT  between CCP.START_DT and CCP.END_DT
          > group by 1, 2, 3
          > )

and Source_System_Id = '10300'

Be carefull, you usually have to repeat the same WHERE condition for the outer query or you get a different answer set.


Dieter



     
  <Prev Next>   <<First <Prev Next> Last>>  
 
 
 
 
 
 
 
 
  
  Top Home Privacy Feedback  
 
 
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 15 Jun 2023