Archives of the TeradataForum

Message Posted: Wed, 09 Oct 2002 @ 07:36:39 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		Re: Name and address matching strategy

From:		Paul Johnson

IMHO it's far more complex than you would first imagine, unless both datasets are from the same source. Name and address matching/de- duping can be a significant element of the ETL process where a single customer view is required.

The business use of the data will determine the type of matching (individual or household etc) and also the tolerance for bad matches and missed matches.

The DIY approach will contain a large element of 're-inventing the wheel'. Also, comparing a record to a variable number of other records (the potential match group) by scanning the individual elements a byte at a time is really not best done in SQL. Even if you can get it to work will it scale?

There are several 'industrial strength' tools out there that I have seen used. One of the favourites is Trillium from Harte-Hanks (that's not an endorsement!).

Hope this helps,

Paul Johnson.