Phrase detectives corpus 1.0 crowdsourced anaphoric coreference

Jon Chamberlain, Massimo Poesio, Udo Kruschwitz

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

Abstract

Natural Language Engineering tasks require large and complex annotated datasets to build more advanced models of language. Corpora are typically annotated by several experts to create a gold standard; however, there are now compelling reasons to use a non-expert crowd to annotate text, driven by cost, speed and scalability. Phrase Detectives Corpus 1.0 is an anaphorically-annotated corpus of encyclopedic and narrative text that contains a gold standard created by multiple experts, as well as a set of annotations created by a large non-expert crowd. Analysis shows very good inter-expert agreement (κ =.88 -.93) but a more variable baseline crowd agreement (κ =.52 -.96). Encyclopedic texts show less agreement (and by implication are harder to annotate) than narrative texts. The release of this corpus is intended to encourage research into the use of crowds for text annotation and the development of more advanced, probabilistic language models, in particular for anaphoric coreference.

Original languageEnglish
Title of host publicationProceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016
EditorsNicoletta Calzolari, Khalid Choukri, Helene Mazo, Asuncion Moreno, Thierry Declerck, Sara Goggi, Marko Grobelnik, Jan Odijk, Stelios Piperidis, Bente Maegaard, Joseph Mariani
PublisherEuropean Language Resources Association (ELRA)
Pages2039-2046
Number of pages8
ISBN (Electronic)9782951740891
Publication statusPublished - 2016
Externally publishedYes
Event10th International Conference on Language Resources and Evaluation, LREC 2016 - Portoroz, Slovenia
Duration: 23 May 201628 May 2016

Publication series

NameProceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016

Conference

Conference10th International Conference on Language Resources and Evaluation, LREC 2016
Country/TerritorySlovenia
CityPortoroz
Period23/05/1628/05/16

Keywords

  • Anaphora
  • Anaphoric coreference
  • Annotation
  • Corpora
  • Crowdsourcing
  • Games-with-a-purpose
  • Gwap
  • Phrase Detectives

Fingerprint

Dive into the research topics of 'Phrase detectives corpus 1.0 crowdsourced anaphoric coreference'. Together they form a unique fingerprint.

Cite this