Bayesian Integration of Probability and Nonprobability Samples for Logistic Regression

Camilla Salvatore*, S. Biffignandi, Joseph W. Sakshaug, Arkadiusz Wiśniowski, Bella Struminskaya

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Probability sample (PS) surveys are considered the gold standard for population-based inference but face many challenges due to decreasing response rates, relatively small sample sizes, and increasing costs. In contrast, the use of nonprobability sample (NPS) surveys has increased significantly due to their convenience, large sample sizes, and relatively low costs, but they are susceptible to large selection biases and unknown selection mechanisms. Integrating both sample types in a way that exploits their strengths and overcomes their weaknesses is an ongoing area of methodological research. We build on previous work by proposing a method of supplementing PSs with NPSs to improve analytic inference for logistic regression coefficients and potentially reduce survey costs. Specifically, we use a Bayesian framework for inference. Inference relies on a probability survey with a small sample size, and through the prior structure we incorporate supplementary auxiliary information from a less-expensive (but potentially biased) NPS survey fielded in parallel. The performance of several strongly informative priors constructed from the NPS information is evaluated through a simulation study and real-data application. Overall, the proposed priors reduce the mean-squared error (MSE) of regression coefficients or, in the worst case, perform similarly to a weakly informative (baseline) prior that does not utilize any nonprobability information. Potential cost savings (of up to 68 percent) are evident compared to a probability-only sampling design with the same MSE for different informative priors under different sample sizes and cost scenarios. The algorithm, detailed results, and interactive cost analysis are provided through a Shiny web app as guidance for survey practitioners.
Original languageEnglish
Pages (from-to)458–492
Number of pages35
JournalJournal of Survey Statistics and Methodology
Volume12
Issue number2
Early online date2023
DOIs
Publication statusPublished - Apr 2024

Bibliographical note

Publisher Copyright:
© 2023 The Author(s). Published by Oxford University Press on behalf of the American Association for Public Opinion Research.

Funding

Camilla Salvatore conducted the majority of her work as a PhD candidate at the Department of Economics, Management and Statistics (DEMS), University of Milano-Bicocca. We greatly acknowledge the DEMS Data Science Lab for supporting this work by providing computational resources. The study design and analysis were not preregistered.

FundersFunder number
Camilla Salvatore conducted the majority of her work as a PhD candidate at the Department of Economics, Management and Statistics (DEMS), University of Milano-Bicocca. We greatly acknowledge the DEMS Data Science Lab for supporting this work by providing c
University of Milano-Bicocca
DEMS Data Science Lab

    Keywords

    • Bayesian inference
    • Data integration
    • Online access panel
    • Selection bias
    • Web survey

    Fingerprint

    Dive into the research topics of 'Bayesian Integration of Probability and Nonprobability Samples for Logistic Regression'. Together they form a unique fingerprint.

    Cite this