Archives of the TeradataForum
Message Posted: Fri, 05 May 2006 @ 09:37:42 GMT
Subj: | | Re: Distance calcuations |
|
From: | | Stover, Terry |
One thing you might do to limit the UDF calls is constrain the product join based on a max distance, using something like
where abs(a.lat - b.lat) < X and abs(a.long - b.long) < Y. X & Y
represent Delta lat / delta long and correspond to the max distance in miles you're willing to consider as a solution. X & Y are
constants, at least over reasonably small distances (and subject to the precision needs of your app).
In a previous job I had to put together an "on the fly" to average data over a user defined distance from a user defined point. We had
hundreds of millions of rows, and this was on SQL Server. There are GIS add ins for this kind of work, but we had a short lead time and no $. We
bucketed the data into 1/4 mile grid squares and assigned each an x & y sequence numbers (x id, y id) during data load (using compiled C++).
The relative distances by (x id delta, y id delta) were materialized and stored in a table, with max distance of 10 miles. The user entered
lat/long is converted to the x id/ y id on input and the rest of the work is just joins and simple a - b calcs. We had both batch and interactive
users and it was actually pretty zippy. The grid algorithm also allowed comparing results across different test cycles because you can compute
average values for a grid square (there are statistical issues with having different number of samples in a grid across the test cycles). We just
fixed the lat / long conversions by city since our data was collected & stored by metropolitan area. The intention was to use the app as a
high level screening tool to pick areas for more detailed analysis in a GIS tool.
|