Abstract
Synthetic data – algorithmically generated data – has been considered a novel solution to the data scarcity issue, and a ‘technical fix’ able to fill the gap in areas where real data is sensitive or biased. Different narratives about the nature of synthetic data as either mirroring or replacing real data, alongside diverse evaluation metrics for measuring the fidelity and utility of such data, have proliferated across the machine learning fairness community, in public policy research, privacy and data protection studies, and critical data scholarship. Yet, there is still no consensus on what constitutes ‘high-quality’ synthetic data. Against this background, I demonstrate how the concept of synthetic data introduces an analogical perspective on data. This perspective is relational and regulative, extending the discussion on data quality to encompass questions of data justice and responsible innovation. It invites critical reflections on the purpose and trade-offs involved in synthetic data generation and use, the social practices and power dynamics that underpin and configure it, and how its direction can be shaped in response to changing real-world circumstances and emerging human values. Building on this analysis, I argue that the generation and use of meaningful synthetic data require promoting responsibility in complex AI and data innovation ecosystems, and facilitating forms of algorithmic reparation and responsiveness.
| Original language | English |
|---|---|
| Article number | 20539517251386053 |
| Journal | Big Data and Society |
| Volume | 12 |
| Issue number | 4 |
| DOIs | |
| Publication status | Published - 1 Oct 2025 |
Bibliographical note
Publisher Copyright:© The Author(s) 2025. This article is distributed under the terms of the Creative Commons Attribution 4.0 License (https://creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access page (https://us.sagepub.com/en-us/nam/open-access-at-sage).
Keywords
- algorithmic fairness
- algorithmic reparation
- data justice
- meaningful human control
- responsible artificial intelligence
- Synthetic data