Home Page for the TeradataForum
 

Archives of the TeradataForum

Message Posted: Tue, 17 Jun 2003 @ 18:22:27 GMT


     
  <Prev Next>   <<First <Prev Next> Last>>  


Subj:   Re: Consecutive sequence sql logic
 
From:   Hough, David A

I had an aha moment last night, and found I could eliminate one of the tables (p2) from my original solution and simply calculate the between range from the p1 values. It surprised me, though, because it runs about 20% slower than the original on my 125K rows of test data. I suspect this is because the data is sparse enough the the p1/p2 join is eliminating a lot of rows up front. If the data were very dense, the new query might run faster because of one less join. Anyway, here's the shortened version:

select distinct
 p1.acct
from
 cust as p1
  inner join
 cust as p3
  on p3.phone between p1.phone and p1.phone+4 and
     p3.acct = p1.acct
group by
 p1.acct, p1.phone
having
 count(*) = 5
;

I can't remember who first raised the issue of MDIFF, but I took a look at that and it works very well on V2R4.1.3. It's about three times faster that the join versions, though you have to live with the present non-ANSI syntax if you want to use it:

select
 acct
from
 (select
   p1.acct,
   mdiff(p1.phone,4,p1.acct,p1.phone)+1
  from
   cust as p1
  group by
   p1.acct) dt(acct, xdiff)
where
 xdiff = 5
group by
 acct
;

There have been a number of solutions proposed that call out every entry in the phone number space explicitly using conditions and/or joins with phone, phone+1, phone+2, ..., phone+4. I checked a couple of them, and they run about the same speed as the BETWEEN join varients, so I'd skip them because they're hard to expand if the requirements change.

Dieter Nöth solved the changing requirements problem very neatly in his solution (my favorite), which returns the exact span sizes in the COUNT(*). If the limit changes from 5 to 10, only the constant in the HAVING clause needs to be changed.

select distinct
   acct
from
  (
   select
     acct
     ,phone
     ,diff
/** this is the trick:
     calculate groups of rows, so consecutive rows with a diff of 0
     will be part of the same datagroup **/
     ,sum(diff) over (partition by acct
                      order by phone
                      rows unbounded preceding) as datagroup
   from
    (
     select
       acct
       ,phone
/** moving difference - 1 between two numbers,
     so consecutive rows have a diff of 0 **/
       ,coalesce(mdiff(phone, 1, phone)-1, 0) as diff
     from cust
     group by acct
    ) dt
  ) dt
group by acct, datagroup
having count(*) >=5
;

The continuing discussion on this problem has been interesting; my thanks to all who have contributed.

/dave hough



     
  <Prev Next>   <<First <Prev Next> Last>>  
 
 
 
 
 
 
 
 
  
  Top Home Privacy Feedback  
 
 
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 27 Dec 2016