Archives of the TeradataForum
Message Posted: Wed, 09 Oct 2002 @ 07:36:39 GMT
IMHO it's far more complex than you would first imagine, unless both datasets are from the same source. Name and address matching/de- duping can be a significant element of the ETL process where a single customer view is required.
The business use of the data will determine the type of matching (individual or household etc) and also the tolerance for bad matches and missed matches.
The DIY approach will contain a large element of 're-inventing the wheel'. Also, comparing a record to a variable number of other records (the potential match group) by scanning the individual elements a byte at a time is really not best done in SQL. Even if you can get it to work will it scale?
There are several 'industrial strength' tools out there that I have seen used. One of the favourites is Trillium from Harte-Hanks (that's not an endorsement!).
Hope this helps,
|Copyright 2016 - All Rights Reserved|
|Last Modified: 27 Dec 2016|