Automatic summarization of domain-specific forum threads: Collecting reference data

Suzan Verberne, Antal van den Bosch, Sander Wubben, Emiel Krahmer

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review


We create and analyze two sets of reference summaries for
discussion threads on a patient support forum: expert summaries
and crowdsourced, non-expert summaries. Ideally,
reference summaries for discussion forum threads are created
by expert members of the forum community. When
there are few or no expert members available, crowdsourcing
the reference summaries is an alternative. In this paper
we investigate whether domain-specific forum data requires
the hiring of domain experts for creating reference
summaries. We analyze the inter-rater agreement for both
datasets and we train summarization models using the two
types of reference summaries. The inter-rater agreement in
crowdsourced reference summaries is low, close to random,
while domain experts achieve a considerably higher, fair,
agreement. The trained models however are similar to each
other. We conclude that it is possible to train an extractive
summarization model on crowdsourced data that is similar
to an expert model, even if the inter-rater agreement for the
crowdsourced data is low.
Original languageEnglish
Title of host publicationProceedings of the 2017 Conference on Human Information Interaction and Retrieval (CHIIR-2017)
PublisherAssociation for Computing Machinery (ACM)
ISBN (Print)978-1-4503-4677-1
Publication statusPublished - 7 Mar 2017
Externally publishedYes


Dive into the research topics of 'Automatic summarization of domain-specific forum threads: Collecting reference data'. Together they form a unique fingerprint.

Cite this