Abstract
Online stories, from blog posts to journalistic articles to scientific
publications, are commonly illustrated with media (e.g. images,
audio clips) or statistical summaries (e.g. tables and graphs). Such
“illustrations” are the result of a process of acquiring, parsing, filtering, mining, representing, refining and interacting with data [3].
Unfortunately, such processes are typically taken for granted and
seldom mentioned in the story itself. Although recently a wide variety of interactive data visualisation techniques have been developed
(see e.g., [6]), in many cases the illustrations in such publications
are static; this prevents different audiences from engaging with
the data and analyses as they desire. In this paper, we share our
experiences with the concept of “data stories” that tackles both issues, enhancing opportunities for outreach, reporting on scientific
inquiry, and FAIR data representation [9].
In journalism data stories are becoming widely accepted as the
output of a process that is in many aspects similar to that of a
computational scholar: gaining insights by analyzing data sets using (semi-)automatized methods and presenting these insights using (interactive) visualizations and other textual outputs based on
data [4] [7] [5] [6].
In the context of scientific output, data stories can be regarded
as digital “publications enriched with or linking to related research
results, such as research data, workflows, software, and possibly
connections among them” [1]. However, as infrastructure for (peerreviewed) enhanced publications is in an early stage of development
(see e.g., [2]), scholarly data stories are currently often produced as
blog posts, discussing a relevant topic. These may be accompanied
by illustrations not limited to a single graph or image but characterized by different forms of interactivity: readers can, for instance,
change the perspective or zoom level of graphs, or cycle through
images or audio clips.
Having experimented successfully with various types and uses
of data stories1
in the CLARIAH2 project, we are working towards
a more generic, stable and sustainable infrastructure to create, publish, and archive data stories. This includes providing environments
for reproduction of data stories and verification of data via “close
reading”. From an infrastructure perspective, this involves the provisioning of services for persistent storage of data (e.g. triple stores),
data registration and search (registries), data publication (SPARQL
end-points, search-APIs), data visualization, and (versioned) query
creation. These services can be used by environments to develop
data stories, either or not facilitating additional data analysis steps.
For data stories that make use of data analysis, for example via
Jupyter Notebooks [8], the infrastructure also needs to take computational requirements (load balancing) and restrictions (security)
into account. Also, when data sets are restricted for copyright or
privacy reasons, authentication and authorization infrastructure
(AAI) is required.
The large and rich data sets in (European) heritage archives
that are increasingly made interoperable using FAIR principles, are
eminently qualified as fertile ground for data stories. We therefore
hope to be able to present our experiences with data stories, share
our strategy for a more generic solution and receive feedback on
shared challenges.
publications, are commonly illustrated with media (e.g. images,
audio clips) or statistical summaries (e.g. tables and graphs). Such
“illustrations” are the result of a process of acquiring, parsing, filtering, mining, representing, refining and interacting with data [3].
Unfortunately, such processes are typically taken for granted and
seldom mentioned in the story itself. Although recently a wide variety of interactive data visualisation techniques have been developed
(see e.g., [6]), in many cases the illustrations in such publications
are static; this prevents different audiences from engaging with
the data and analyses as they desire. In this paper, we share our
experiences with the concept of “data stories” that tackles both issues, enhancing opportunities for outreach, reporting on scientific
inquiry, and FAIR data representation [9].
In journalism data stories are becoming widely accepted as the
output of a process that is in many aspects similar to that of a
computational scholar: gaining insights by analyzing data sets using (semi-)automatized methods and presenting these insights using (interactive) visualizations and other textual outputs based on
data [4] [7] [5] [6].
In the context of scientific output, data stories can be regarded
as digital “publications enriched with or linking to related research
results, such as research data, workflows, software, and possibly
connections among them” [1]. However, as infrastructure for (peerreviewed) enhanced publications is in an early stage of development
(see e.g., [2]), scholarly data stories are currently often produced as
blog posts, discussing a relevant topic. These may be accompanied
by illustrations not limited to a single graph or image but characterized by different forms of interactivity: readers can, for instance,
change the perspective or zoom level of graphs, or cycle through
images or audio clips.
Having experimented successfully with various types and uses
of data stories1
in the CLARIAH2 project, we are working towards
a more generic, stable and sustainable infrastructure to create, publish, and archive data stories. This includes providing environments
for reproduction of data stories and verification of data via “close
reading”. From an infrastructure perspective, this involves the provisioning of services for persistent storage of data (e.g. triple stores),
data registration and search (registries), data publication (SPARQL
end-points, search-APIs), data visualization, and (versioned) query
creation. These services can be used by environments to develop
data stories, either or not facilitating additional data analysis steps.
For data stories that make use of data analysis, for example via
Jupyter Notebooks [8], the infrastructure also needs to take computational requirements (load balancing) and restrictions (security)
into account. Also, when data sets are restricted for copyright or
privacy reasons, authentication and authorization infrastructure
(AAI) is required.
The large and rich data sets in (European) heritage archives
that are increasingly made interoperable using FAIR principles, are
eminently qualified as fertile ground for data stories. We therefore
hope to be able to present our experiences with data stories, share
our strategy for a more generic solution and receive feedback on
shared challenges.
Original language | English |
---|---|
Number of pages | 1 |
DOIs | |
Publication status | Published - 31 May 2022 |
Event | DARIAH Annual Event 2022 - Athens, Greece Duration: 31 May 2022 → 3 Jun 2022 |
Conference
Conference | DARIAH Annual Event 2022 |
---|---|
Country/Territory | Greece |
City | Athens |
Period | 31/05/22 → 3/06/22 |
Keywords
- data stories
- data sets
- storytelling
- digital journalism
- digital humanities
- enhanced publications