Home Page for the TeradataForum
 

Archives of the TeradataForum

Message Posted: Tue, 17 Jun 2003 @ 18:22:27 GMT


     
  <Prev Next>   <<First <Prev Next> Last>>  


Subj:   Re: Consecutive sequence sql logic
 
From:   Hough, David A

I had an aha moment last night, and found I could eliminate one of the tables (p2) from my original solution and simply calculate the between range from the p1 values. It surprised me, though, because it runs about 20% slower than the original on my 125K rows of test data. I suspect this is because the data is sparse enough the the p1/p2 join is eliminating a lot of rows up front. If the data were very dense, the new query might run faster because of one less join. Anyway, here's the shortened version:

select distinct
 p1.acct
from
 cust as p1
  inner join
 cust as p3
  on p3.phone between p1.phone and p1.phone+4 and
     p3.acct = p1.acct
group by
 p1.acct, p1.phone
having
 count(*) = 5
;

I can't remember who first raised the issue of MDIFF, but I took a look at that and it works very well on V2R4.1.3. It's about three times faster that the join versions, though you have to live with the present non-ANSI syntax if you want to use it:

select
 acct
from
 (select
   p1.acct,
   mdiff(p1.phone,4,p1.acct,p1.phone)+1
  from
   cust as p1
  group by
   p1.acct) dt(acct, xdiff)
where
 xdiff = 5
group by
 acct
;

There have been a number of solutions proposed that call out every entry in the phone number space explicitly using conditions and/or joins with phone, phone+1, phone+2, ..., phone+4. I checked a couple of them, and they run about the same speed as the BETWEEN join varients, so I'd skip them because they're hard to expand if the requirements change.

Dieter N�th solved the changing requirements problem very neatly in his solution (my favorite), which returns the exact span sizes in the COUNT(*). If the limit changes from 5 to 10, only the constant in the HAVING clause needs to be changed.

select distinct
   acct
from
  (
   select
     acct
     ,phone
     ,diff
/** this is the trick:
     calculate groups of rows, so consecutive rows with a diff of 0
     will be part of the same datagroup **/
     ,sum(diff) over (partition by acct
                      order by phone
                      rows unbounded preceding) as datagroup
   from
    (
     select
       acct
       ,phone
/** moving difference - 1 between two numbers,
     so consecutive rows have a diff of 0 **/
       ,coalesce(mdiff(phone, 1, phone)-1, 0) as diff
     from cust
     group by acct
    ) dt
  ) dt
group by acct, datagroup
having count(*) >=5
;

The continuing discussion on this problem has been interesting; my thanks to all who have contributed.

/dave hough



     
  <Prev Next>   <<First <Prev Next> Last>>  
 
 
 
 
 
 
 
 
  
  Top Home Privacy Feedback  
 
 
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 15 Jun 2023