Learning Name Variants from Inexact High-Confidence Matches

Research output: Chapter in Book/Report/Conference proceedingChapterAcademicpeer-review

Abstract

Name variants which differ more than a few characters can seriously hamper record linkage. A method is described by which variants of first names and surnames can be learned automatically from records that contain more information than needed for a true link decision. Post-processing and limited manual intervention (active learning) is unavoidable, however, to differentiate errors in the original and the digitised data from variants. The method is demonstrated on the basis of an analysis of 14.8 million records from the Dutch vital registration.
Original languageEnglish
Title of host publicationPopulation Reconstruction
EditorsGerrit Bloothooft, Peter Christen, Kees Mandemakers, Marijn Schraagen
Place of PublicationCham
PublisherSpringer
Chapter4
Pages61-83
Number of pages23
ISBN (Electronic)978-3-319-19884-2
ISBN (Print)978-3-319-19883-5
DOIs
Publication statusPublished - 4 Aug 2015

Keywords

  • record linkage
  • historical
  • names

Fingerprint

Dive into the research topics of 'Learning Name Variants from Inexact High-Confidence Matches'. Together they form a unique fingerprint.

Cite this