Home Page for the TeradataForum
 

Archives of the TeradataForum

Message Posted: Wed, 09 Oct 2002 @ 07:36:39 GMT


     
  <Prev Next>   <<First <Prev Next> Last>>  


Subj:   Re: Name and address matching strategy
 
From:   Paul Johnson

IMHO it's far more complex than you would first imagine, unless both datasets are from the same source. Name and address matching/de- duping can be a significant element of the ETL process where a single customer view is required.

The business use of the data will determine the type of matching (individual or household etc) and also the tolerance for bad matches and missed matches.

The DIY approach will contain a large element of 're-inventing the wheel'. Also, comparing a record to a variable number of other records (the potential match group) by scanning the individual elements a byte at a time is really not best done in SQL. Even if you can get it to work will it scale?

There are several 'industrial strength' tools out there that I have seen used. One of the favourites is Trillium from Harte-Hanks (that's not an endorsement!).


Hope this helps,

Paul Johnson.



     
  <Prev Next>   <<First <Prev Next> Last>>  
 
 
 
 
 
 
 
 
  
  Top Home Privacy Feedback  
 
 
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 27 Dec 2016