TY - JOUR
T1 - The potential of benchmark challenges in the social sciences
AU - Pankowska, Paulina
AU - Mendrik, Adrienne
AU - Emery, Tom
AU - Garcia-Bernardo, Javier
N1 - Publisher Copyright:
© The Author(s) 2024.
PY - 2024/12
Y1 - 2024/12
N2 - Social scientists aim to create explanations of the world. For each social phenomenon, scientists have proposed a myriad of theories to explain its working mechanisms. Traditionally, these theories are tested by generating hypotheses, translating them into a statistical model, and assessing the significance of the model’s coefficients. Such an approach, however, often leads to the specification of a large number of (at times contradictory) models, all asserting that they capture the same theory. As things currently stand, there is no framework that allows for a comparison of these models. In this article, we argue that benchmarks can serve as a standard frame of reference that can help to determine which models fit better with empirical observations in a specific context. A benchmark is a standardized validation framework that allows for a direct comparison of the prediction accuracy of various models that address the same research problem. We outline the potential of organizing benchmark challenges in the social sciences and provide recommendations for their utilization.
AB - Social scientists aim to create explanations of the world. For each social phenomenon, scientists have proposed a myriad of theories to explain its working mechanisms. Traditionally, these theories are tested by generating hypotheses, translating them into a statistical model, and assessing the significance of the model’s coefficients. Such an approach, however, often leads to the specification of a large number of (at times contradictory) models, all asserting that they capture the same theory. As things currently stand, there is no framework that allows for a comparison of these models. In this article, we argue that benchmarks can serve as a standard frame of reference that can help to determine which models fit better with empirical observations in a specific context. A benchmark is a standardized validation framework that allows for a direct comparison of the prediction accuracy of various models that address the same research problem. We outline the potential of organizing benchmark challenges in the social sciences and provide recommendations for their utilization.
KW - benchmark challenge
KW - benchmarking
KW - common task method
KW - mass collaboration
KW - validation framework
UR - http://www.scopus.com/inward/record.url?scp=85210732999&partnerID=8YFLogxK
U2 - 10.1177/05390184241297742
DO - 10.1177/05390184241297742
M3 - Article
AN - SCOPUS:85210732999
SN - 0539-0184
VL - 63
SP - 498
EP - 519
JO - Social Science Information
JF - Social Science Information
IS - 4
ER -