Abstract
Over the last fifteen years, annotating discourse relations has gained increasing interest of the linguistics research community. Indeed, it is a promising and challenging research area, which allows for systematic cross-linguistic comparison at the discourse level. A lot of progress has been achieved through large discourse-annotated corpora; leading examples are the Penn Discourse Treebank (Prasad et al., 2008), the Rhetorical Structure Theory (RST) Treebank (Carlson et al., 2001) and SDRT (Reese et al., 2007). However, existing discourse annotation guidelines differ in other important aspects, such as the type of relations that are distinguished. Some proposals present sets of approximately 20 relations (Mann and Thompson, 1988). The PDTB contains a three-tiered hierarchical classification of 43 sense tags (Prasad et al., 2008), and the annotation scheme used for the RST Treebank distinguishes 78 relations that can be partitioned in 16 classes (Carlson et al., 2001). Hence, it is not clear which and how many categories (for example, contingency, causal, or informational) and end labels (for example, result, volitional cause, and cause-consequence are all labels for causal relations) are needed to adequately describe and distinguish coherence relations. One thing that is clear is that annotation has proven to be a difficult task, which is regularly reflected in low inter-annotator agreement scores. Furthermore, it is often hard to compare outcomes of corpus-based studies, because leading proposals differ in the precise relations they distinguish. This is unfortunate; it would be much better if all these annotated corpora could be compared. Our goal is to suggest how the discourse relation annotations used by the different schemes can be mapped onto one another. An important consideration is to be able to represent all of the annotations that the different schemes have considered relevant for discourse relation annotation (see Benamara & Taboada, 2015, for SDRT-RST-mapping). More specifically, we will compare PDTB, RST and SDRT in terms of a limited set of dimensions, and show how they map onto each other. We describe relations in terms of the properties they share, which allows for clustering of relations. For instance, all systems distinguish between positive and negative relations. We will show how this dimension allows for similar clusterings across systems. Some dimensions are similar to a Cognitive approach to Coherence Relations (CCR; Sanders et al. 1992), while additional criteria capture more fine-grained distinctions. We describe discourse relations in terms of this limited set of dimensions and criteria, and show how the various existing proposals can be related to each other. This leads to a unifying proposal, which allows us to ‘translate’ outcomes from one framework to the terminology of another. This way, we want to contribute to the ultimate goal: to make optimal use of existing corpora and facilitate discussion among researchers working in different paradigmata.
| Original language | English |
|---|---|
| Publication status | Published - 20 Apr 2016 |
| Event | TextLink Second Action Conference - Budapest, Hungary Duration: 11 Apr 2016 → 13 Apr 2016 |
Conference
| Conference | TextLink Second Action Conference |
|---|---|
| Country/Territory | Hungary |
| City | Budapest |
| Period | 11/04/16 → 13/04/16 |