Improving Probabilistic Record Linkage Using Statistical Prediction Models

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Record linkage brings together information from records in two or more data sources that are believed to belong to the same statistical unit based on a common set of matching variables. Matching variables, however, can appear with errors and variations and the challenge is to link statistical units that are subject to error. We provide an overview of record linkage techniques and specifically investigate the classic Fellegi and Sunter probabilistic record linkage framework to assess whether the decision rule for classifying pairs into sets of matches and non-matches can be improved by incorporating a statistical prediction model. We also study whether the enhanced linkage rule can provide better results in terms of preserving associations between variables in the linked data file that are not used in the matching procedure. A simulation study and an application based on real data are used to evaluate the methods.

Original languageEnglish
Pages (from-to)368-394
Number of pages27
JournalInternational Statistical Review
Volume91
Issue number3
Early online date2022
DOIs
Publication statusPublished - Dec 2023

Bibliographical note

Publisher Copyright:
© 2022 The Authors. International Statistical Review published by John Wiley & Sons Ltd on behalf of International Statistical Institute.

Funding

The research leading to these results has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement no. 730998 (InGRID‐2 Integrating Research Infrastructure for European expertise on Inclusive Growth from data to policy).

FundersFunder number
Horizon 2020730998

    Keywords

    • Linkage errors
    • matching variables
    • predictions
    • propensity scores

    Fingerprint

    Dive into the research topics of 'Improving Probabilistic Record Linkage Using Statistical Prediction Models'. Together they form a unique fingerprint.

    Cite this