Annotating implicit coherence relations in parallel corpora.

J. Hoek, S.I. Zufferey

Research output: Contribution to conferencePosterOther research output

Abstract

Annotating coherence relations is a difficult task that requires detailed annotation schemes and well-trained annotators. Existing discourse-annotated corpora such as the Penn Discourse Treebank, RST Treebank, and the TüBa-D/Z corpus all have different annotation manuals: not only do these corpora differ in the types of relations they distinguish, but also in their segmentation rules and even in their definition of what constitutes a coherence relation. Another problem is caused by the fact that coherence relations can, but need not, be made linguistically explicit by means of connectives (because, if) or cue phrases (as a result, despite the fact that). The absence of a connective seems to introduce additional complications to the annotation process. Implicit coherence relations leave annotators with less evidence pointing toward a particular relation and the locating of a coherence relation becomes in itself a potential source of disagreement. In the different discourse-annotated corpora there is even less consensus on how to locate and annotate implicit relations than explicit relations. In this presentation, we argue that parallel corpora are useful tools for locating, annotating, and researching the characteristics of implicit coherence relations. We used directional corpora extracted from the Europarl corpus (Koehn 2005; Cartoni, Zufferey & Meyer 2013a) and manually spotted cases of implicit translations using the translation spotting method (Cartoni, Zufferey, & Meyer 2013b) across four target languages (French, German, Dutch and Spanish). Conversely, we spotted implicit relations in English source texts that were explicitated in (one of) the target texts. Finally, we spotted explicitations and implicitatons of the English connectives in translated texts from the same four languages, now functioning as the source languages. We then annotated the English discourse relations using the set of basic features defined by Sanders, Spooren & Noordman (1992). Our results indicate that the basic features of coherence relations conveyed by connectives helps to predict their explicit vs. implicit translation across languages.
Original languageEnglish
Publication statusPublished - 27 Jan 2015
EventTextLink First Action Conference - Louvain-la-Neuve, Belgium
Duration: 26 Jan 201528 Jan 2015

Conference

ConferenceTextLink First Action Conference
Country/TerritoryBelgium
CityLouvain-la-Neuve
Period26/01/1528/01/15

Fingerprint

Dive into the research topics of 'Annotating implicit coherence relations in parallel corpora.'. Together they form a unique fingerprint.

Cite this