TY - JOUR
T1 - Performance Measures for Sample Selection Bias Correction by Weighting
AU - Liu, An-Chiao
AU - Scholtus, Sander
AU - Deun, Katrijn Van
AU - de Waal, Ton
PY - 2025/3/14
Y1 - 2025/3/14
N2 - When estimating a population parameter by a nonprobability sample, that is, a sample without a known sampling mechanism, the estimate may suffer from sample selection bias. To correct selection bias, one of the often-used methods is assigning a set of unit weights to the nonprobability sample, and estimating the target parameter by a weighted sum. Such weights are often obtained with classification methods. However, a tailor-made framework to evaluate the quality of the assigned weights is missing in the literature, and the evaluation framework for prediction may not be suitable for population parameter estimation by weighting. We try to fill in the gap by discussing several promising performance measures, which are inspired by classical calibration and measures of selection bias. In this paper, we assume that the population parameter of interest is the population mean of a target variable. A simulation study and real data examples show that some performance measures have a strong positive relationship with the mean squared error and/or error of the estimated population mean. These performance measures may be helpful for model selection when constructing weights by logistic regression or machine learning algorithms.
AB - When estimating a population parameter by a nonprobability sample, that is, a sample without a known sampling mechanism, the estimate may suffer from sample selection bias. To correct selection bias, one of the often-used methods is assigning a set of unit weights to the nonprobability sample, and estimating the target parameter by a weighted sum. Such weights are often obtained with classification methods. However, a tailor-made framework to evaluate the quality of the assigned weights is missing in the literature, and the evaluation framework for prediction may not be suitable for population parameter estimation by weighting. We try to fill in the gap by discussing several promising performance measures, which are inspired by classical calibration and measures of selection bias. In this paper, we assume that the population parameter of interest is the population mean of a target variable. A simulation study and real data examples show that some performance measures have a strong positive relationship with the mean squared error and/or error of the estimated population mean. These performance measures may be helpful for model selection when constructing weights by logistic regression or machine learning algorithms.
U2 - 10.1177/0282423X251318463
DO - 10.1177/0282423X251318463
M3 - Article
SN - 0282-423X
JO - Journal of Official Statistics
JF - Journal of Official Statistics
ER -