Abstract
Name variants which differ more than a few characters can seriously hamper record linkage. A method is described by which variants of first names and surnames can be learned automatically from records that contain more information than needed for a true link decision. Post-processing and limited manual intervention (active learning) is unavoidable, however, to differentiate errors in the original and the digitised data from variants. The method is demonstrated on the basis of an analysis of 14.8 million records from the Dutch vital registration.
Original language | English |
---|---|
Title of host publication | Population Reconstruction |
Editors | Gerrit Bloothooft, Peter Christen, Kees Mandemakers, Marijn Schraagen |
Place of Publication | Cham |
Publisher | Springer |
Chapter | 4 |
Pages | 61-83 |
Number of pages | 23 |
ISBN (Electronic) | 978-3-319-19884-2 |
ISBN (Print) | 978-3-319-19883-5 |
DOIs | |
Publication status | Published - 4 Aug 2015 |
Keywords
- record linkage
- historical
- names