Segmenting discourse units: Incorporating interpretation into decision rules?

Research output: Contribution to conferenceAbstractOther research output

Abstract

Discourse relations connect two or more segments. Segmentation is an important step in the process of annotating discourse relations, but often not one extensively discussed in annotation methods or manuals. Ideally, implementing segmentation rules results in text segments that correspond to the units of thought related to each other. However, in many often-used annotation systems this does not always seem to be the case. Most formalized segmentation rules (e.g. Carlson & Marcu, 2001; Mann & Thompson, 1988; Reese, Hunter, Asher, Denis, & Baldridge, 2007; Sanders & van Wijk, 1996) would, for instance, not allow segmenting the conditional relation in (1), either because too many elements in S1 have been elided or because the segment following if would break up a larger unit. Still, the segmentation indicated in (1) seems very plausible and exactly captures the two segments related by the connective if
(1) (context: The virus harms cold-blooded animals.) It does not replicate at temperatures above 25° centigrade and [would,]S2a if [present in fish for human consumption,]S1 [be inactivated when ingested.]S2b (ep 00-03-01) 
In this presentation we present fragments encountered during an annotation effort of (explicit) local discourse relations from the Europarl corpus (Koehn, 2005) that are problematic to segment under most segmentation guidelines. We focus on three specific problems: ellipsis, complement structures, and perspective markers. We propose segmentation options that result in segments that do justice to the interpretation of the discourse relation and use translations (from the Europarl Direct corpus, Cartoni, Zufferey, & Meyer; 2013) as additional support for our analysis. Finally, we explore ways to formulate rules that produce text segments that do justice to interpretation. We conclude that segmentation is in part dependent on the propositional content of text fragments, and that completely separating segmentation and annotation (i.e. treating it as a two-step process) does not always yield text segments that correspond to the text units between which a conceptual relationship (potentially signaled by a connective) holds (see also Verhagen, 2001). Although relying partly on the content of a text fragment results in better text segmentation, this does in turn raise problems for (semi-) automatically segmenting texts. Identifying specific problems, such as the ones addressed here, and being more explicit in segmentation strategies used in the annotation of discourse relations are important steps toward solving these problems. 
Original languageEnglish
Publication statusPublished - 25 Jan 2016
EventLPTS2016 - Valencia, Spain
Duration: 24 Jan 201626 Jan 2016

Conference

ConferenceLPTS2016
Country/TerritorySpain
CityValencia
Period24/01/1626/01/16

Fingerprint

Dive into the research topics of 'Segmenting discourse units: Incorporating interpretation into decision rules?'. Together they form a unique fingerprint.

Cite this