Mapping the PERFECT via Translation Mining

Research output: Contribution to conferenceAbstractOther research output

Abstract

Recently, a trend in typology has been to create semantic maps (Haspelmath 1997), not from intuitions and examples, but directly from data extracted from multilingual parallel corpora (Wälchli & Cysouw 2012). In our research, we continue in the same vein, but focusing on the level of grammar instead of the lexical domain. Specifically, we are interested in mapping the PERFECT across five European languages (Dutch, English, French, German, Spanish). We dub our method Translation Mining.
We first extracted present perfects from the EuroParl corpus (Tiedemann 2012) using a methodology that was presented at CLIN26 (van der Klis, Le Bruyn & de Swart 2015). A human annotator (using a web application designed for this purpose) then marked the corresponding verb phrases in the aligned fragments. Tenses of these verb phrases were then automatically or manually assigned, depending on the degree of detail of part-of-speech tags per language.
This process yielded five-tuples of aligned tense attributions. We designed a distance measure to be able to create a (dis)similarity matrix, and then plotted this matrix using multidimensional scaling (MDS). On top of that, we created an interactive visualization that allows researchers to manipulate the dimensions of the MDS algorithm, as well as to inspect the individual data points.
These interactive maps allowed us to reproduce earlier research (e.g. Portner 2003), but also to draw new conclusions of the tense/aspect role of the PERFECT across languages. We repeated the same method on the OpenSubtitles2016 corpus (Lison & Tiedemann 2016) to check for genre variation.
Original languageEnglish
Publication statusPublished - 2017
EventComputational Linguistics in the Netherlands - Faculty of Arts (Erasmushuis), Leuven, Belgium
Duration: 10 Feb 201710 Feb 2017
Conference number: 27
http://www.ccl.kuleuven.be/CLIN27/

Conference

ConferenceComputational Linguistics in the Netherlands
Abbreviated titleCLIN
Country/TerritoryBelgium
CityLeuven
Period10/02/1710/02/17
Internet address

Keywords

  • semantic maps
  • perfect
  • tense-aspect
  • multilingual parallel corpora
  • multidimensional scaling

Fingerprint

Dive into the research topics of 'Mapping the PERFECT via Translation Mining'. Together they form a unique fingerprint.

Cite this