Synthetic data as meaningful data. On Responsibility in data ecosystems

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Synthetic data – algorithmically generated data – has been considered a novel solution to the data scarcity issue, and a ‘technical fix’ able to fill the gap in areas where real data is sensitive or biased. Different narratives about the nature of synthetic data as either mirroring or replacing real data, alongside diverse evaluation metrics for measuring the fidelity and utility of such data, have proliferated across the machine learning fairness community, in public policy research, privacy and data protection studies, and critical data scholarship. Yet, there is still no consensus on what constitutes ‘high-quality’ synthetic data. Against this background, I demonstrate how the concept of synthetic data introduces an analogical perspective on data. This perspective is relational and regulative, extending the discussion on data quality to encompass questions of data justice and responsible innovation. It invites critical reflections on the purpose and trade-offs involved in synthetic data generation and use, the social practices and power dynamics that underpin and configure it, and how its direction can be shaped in response to changing real-world circumstances and emerging human values. Building on this analysis, I argue that the generation and use of meaningful synthetic data require promoting responsibility in complex AI and data innovation ecosystems, and facilitating forms of algorithmic reparation and responsiveness.

Original languageEnglish
Article number20539517251386053
JournalBig Data and Society
Volume12
Issue number4
DOIs
Publication statusPublished - 1 Oct 2025

Bibliographical note

Publisher Copyright:
© The Author(s) 2025. This article is distributed under the terms of the Creative Commons Attribution 4.0 License (https://creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access page (https://us.sagepub.com/en-us/nam/open-access-at-sage).

Keywords

  • algorithmic fairness
  • algorithmic reparation
  • data justice
  • meaningful human control
  • responsible artificial intelligence
  • Synthetic data

Fingerprint

Dive into the research topics of 'Synthetic data as meaningful data. On Responsibility in data ecosystems'. Together they form a unique fingerprint.

Cite this