PARSEME corpus release 1.3

Agata Savary, Cherifa Ben Khelil, Carlos Ramisch, Voula Giouli, Verginica Barbu Mititelu, Najet Hadj Mohamed, Cvetana Krstev, Chaya Liebeskind, Hongzhi Xu, Sara Stymne, Tunga Güngör, Thomas Pickard, Bruno Guillaume, Eduard Bejček, Archna Bhatia, Marie Candito, Polona Gantar, Uxoa Iñurrieta, Albert Gatt, Jolanta KovalevskaiteTimm Lichte, Nikola Ljubešić, Johanna Monti, Carla Parra Escartin, Mehrnoush Shamsfard, Ivelina Stoyanova, Veronika Vincze, Abigail Walsh

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

Abstract

We present version 1.3 of the PARSEME multilingual corpus annotated with verbal multiword expressions. Since the previous version, new languages have joined the undertaking of creating such a resource, some of the already existing corpora have been enriched with new annotated texts, while others have been enhanced in various ways. The PARSEME multilingual corpus represents 26 languages now. All monolingual corpora therein use Universal Dependencies v.2 tagset. They are (re-)split observing the PARSEME v.1.2 standard, which puts impact on unseen VMWEs. With the current iteration, the corpus release process has been detached from shared tasks; instead, a process for continuous improvement and systematic releases has been introduced.
Original languageEnglish
Title of host publication19th Workshop on Multiword Expressions, MWE 2023 - Proceedings
EditorsArchna Bhatia, Kilian Evang, Marcos Garcia, Voula Giouli, Lifeng Han, Shiva Taslimipoor
Place of PublicationDubrovnik, Croatia
PublisherAssociation for Computational Linguistics
Pages24-35
Number of pages12
ISBN (Electronic)9781959429593
Publication statusPublished - 1 May 2023

Publication series

Name19th Workshop on Multiword Expressions, MWE 2023 - Proceedings

Keywords

  • language resources
  • multilinguality

Fingerprint

Dive into the research topics of 'PARSEME corpus release 1.3'. Together they form a unique fingerprint.

Cite this