Leveraging GPT for the Generation of Multi-Platform Social Media Datasets for Research

Henry Tari, M. Danial Khan, Justus Rutten, Darian Othman, Thales Bertaglia, Rishabh Kaushal, Adriana Iamnitchi

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

Abstract

Social media datasets are essential for research on disinformation, influence operations, social sensing, hate speech detection, cyberbullying, and other significant topics. However, access to these datasets is often restricted due to costs and platform regulations. As such, acquiring datasets that span multiple platforms which are crucial for a comprehensive understanding of the digital ecosystem is particularly challenging. This paper explores the potential of large language models to create lexically and semantically relevant social media datasets across multiple platforms, aiming to match the quality of real datasets. We employ ChatGPT to generate synthetic data from a real dataset consisting of posts from three different social media platforms. We assess the lexical and semantic properties of the synthetic data and compare them with those of the real data. Our empirical findings suggest that using large language models to generate synthetic multi-platform social media data is promising. However, further enhancements are necessary to improve the fidelity of the outputs.

Original languageEnglish
Title of host publicationHT 2024
Subtitle of host publicationCreative Intelligence - 35th ACM Conference on Hypertext and Social Media
PublisherAssociation for Computing Machinery
Pages337-343
Number of pages7
ISBN (Electronic)9798400705953
DOIs
Publication statusPublished - 10 Sept 2024
Event35th ACM Conference on Hypertext and Social Media, HT 2024 - Poznan, Poland
Duration: 10 Sept 202413 Sept 2024

Publication series

NameHT 2024: Creative Intelligence - 35th ACM Conference on Hypertext and Social Media

Conference

Conference35th ACM Conference on Hypertext and Social Media, HT 2024
Country/TerritoryPoland
CityPoznan
Period10/09/2413/09/24

Keywords

  • LLMs
  • Social Media Research
  • Synthetic Data

Fingerprint

Dive into the research topics of 'Leveraging GPT for the Generation of Multi-Platform Social Media Datasets for Research'. Together they form a unique fingerprint.

Cite this