Abstract
There is growing interest within National Statistical Institutes in combining available datasets containing information on a large variety of social domains. Statistical matching approaches can be used to integrate data sources through a common set of variables where each dataset contains different units that belong to the same target population. However, a common problem is related to the assumption of conditional independence among variables observed in different data sources. In this context, an auxiliary dataset containing all the variables jointly can be used to improve the statistical matching by providing information on the correlation structure of variables observed across different datasets. We propose modifying the prediction models from the auxiliary dataset through a calibration step and show that we can improve the outcome of statistical matching in a variety of settings. We evaluate the proposed approach via simulation and an application based on the European Union Statistics for Income and Living Conditions and Living Costs and Food Survey for the United Kingdom.
Original language | English |
---|---|
Pages (from-to) | 619–642 |
Number of pages | 24 |
Journal | Journal of Survey Statistics and Methodology |
Volume | 11 |
Issue number | 3 |
Early online date | 13 Feb 2023 |
DOIs | |
Publication status | Published - 1 Jun 2023 |
Bibliographical note
Funding Information:The research leading to these results has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement no 730998 (InGRID-2 Integrating Research Infrastructure for European Expertise on Inclusive Growth from Data to Policy).
Publisher Copyright:
© 2023 The Author(s). Published by Oxford University Press on behalf of the American Association for Public Opinion Research.
Keywords
- Data fusion
- Data integration
- Distance hot deck
- Model calibration
- Predictive mean matching