Archives of the TeradataForum
Message Posted: Tue, 17 Jun 2003 @ 18:22:27 GMT
I had an aha moment last night, and found I could eliminate one of the tables (p2) from my original solution and simply calculate the between range from the p1 values. It surprised me, though, because it runs about 20% slower than the original on my 125K rows of test data. I suspect this is because the data is sparse enough the the p1/p2 join is eliminating a lot of rows up front. If the data were very dense, the new query might run faster because of one less join. Anyway, here's the shortened version:
select distinct p1.acct from cust as p1 inner join cust as p3 on p3.phone between p1.phone and p1.phone+4 and p3.acct = p1.acct group by p1.acct, p1.phone having count(*) = 5 ;
I can't remember who first raised the issue of MDIFF, but I took a look at that and it works very well on V2R4.1.3. It's about three times faster that the join versions, though you have to live with the present non-ANSI syntax if you want to use it:
select acct from (select p1.acct, mdiff(p1.phone,4,p1.acct,p1.phone)+1 from cust as p1 group by p1.acct) dt(acct, xdiff) where xdiff = 5 group by acct ;
There have been a number of solutions proposed that call out every entry in the phone number space explicitly using conditions and/or joins with phone, phone+1, phone+2, ..., phone+4. I checked a couple of them, and they run about the same speed as the BETWEEN join varients, so I'd skip them because they're hard to expand if the requirements change.
Dieter Nöth solved the changing requirements problem very neatly in his solution (my favorite), which returns the exact span sizes in the COUNT(*). If the limit changes from 5 to 10, only the constant in the HAVING clause needs to be changed.
select distinct acct from ( select acct ,phone ,diff /** this is the trick: calculate groups of rows, so consecutive rows with a diff of 0 will be part of the same datagroup **/ ,sum(diff) over (partition by acct order by phone rows unbounded preceding) as datagroup from ( select acct ,phone /** moving difference - 1 between two numbers, so consecutive rows have a diff of 0 **/ ,coalesce(mdiff(phone, 1, phone)-1, 0) as diff from cust group by acct ) dt ) dt group by acct, datagroup having count(*) >=5 ;
The continuing discussion on this problem has been interesting; my thanks to all who have contributed.
|Copyright 2016 - All Rights Reserved|
|Last Modified: 27 Dec 2016|