Re: Consecutive sequence sql logic [From Hough, David A: Tue, 17 Jun 2003 @ 18:22 GMT]

Message Posted: Tue, 17 Jun 2003 @ 18:22:27 GMT

Subj:

Re: Consecutive sequence sql logic

From:

Hough, David A

I had an aha moment last night, and found I could eliminate one of the tables (p2) from my original solution and simply calculate the between range from the p1 values. It surprised me, though, because it runs about 20% slower than the original on my 125K rows of test data. I suspect this is because the data is sparse enough the the p1/p2 join is eliminating a lot of rows up front. If the data were very dense, the new query might run faster because of one less join. Anyway, here's the shortened version:

select distinct p1.acct from cust as p1 inner join cust as p3 on p3.phone between p1.phone and p1.phone+4 and p3.acct = p1.acct group by p1.acct, p1.phone having count(*) = 5 ;

I can't remember who first raised the issue of MDIFF, but I took a look at that and it works very well on V2R4.1.3. It's about three times faster that the join versions, though you have to live with the present non-ANSI syntax if you want to use it:

select acct from (select p1.acct, mdiff(p1.phone,4,p1.acct,p1.phone)+1 from cust as p1 group by p1.acct) dt(acct, xdiff) where xdiff = 5 group by acct ;

There have been a number of solutions proposed that call out every entry in the phone number space explicitly using conditions and/or joins with phone, phone+1, phone+2, ..., phone+4. I checked a couple of them, and they run about the same speed as the BETWEEN join varients, so I'd skip them because they're hard to expand if the requirements change.

Dieter N�th solved the changing requirements problem very neatly in his solution (my favorite), which returns the exact span sizes in the COUNT(*). If the limit changes from 5 to 10, only the constant in the HAVING clause needs to be changed.

select distinct acct from ( select acct ,phone ,diff /** this is the trick: calculate groups of rows, so consecutive rows with a diff of 0 will be part of the same datagroup **/ ,sum(diff) over (partition by acct order by phone rows unbounded preceding) as datagroup from ( select acct ,phone /** moving difference - 1 between two numbers, so consecutive rows have a diff of 0 **/ ,coalesce(mdiff(phone, 1, phone)-1, 0) as diff from cust group by acct ) dt ) dt group by acct, datagroup having count(*) >=5 ;

The continuing discussion on this problem has been interesting; my thanks to all who have contributed.

/dave hough

Attachments

Library

Quick Reference

Archives of the TeradataForum

Message Posted: Tue, 17 Jun 2003 @ 18:22:27 GMT