Finding Dutch multiword expressions

Jan Odijk, Martin Kroon, Tijmen C. Baarda, Ben Bonfil, Sheean Spoel

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

1 Downloads (Pure)

Abstract

We present MWE-Finder, which enables a user to search for occurrences of multiword expressions (MWEs) in large Dutch text corpora. Components of many MWEs in Dutch can occur in
multiple forms, need not be adjacent, and can occur in multiple orders (such MWEs are called
flexible). Searching for occurrences of such flexible MWEs is difficult and cannot be done reliably with most search applications. What is needed is a search engine that takes into account
the grammatical configuration of the MWE. MWE-Finder is therefore embedded in GrETEL,
a treebank search application for Dutch. A user can enter an example of a MWE in a specific
canonical form, after which the system searches for sentences in which the MWE occurs, using
queries generated automatically from the canonical form. The MWE can also be selected from a
list of more than 11k canonical forms for Dutch MWEs that MWE-Finder offers. We will show
that MWE-Finder also offers facilities to find examples with unexpected modifiers or determiners
on components of the MWE
Original languageEnglish
Title of host publicationCLARIN Annual Conference Proceedings 2023
EditorsKrister Lindén, Jyrki Niemi, Thalassia Kontino
Place of PublicationUtrecht
PublisherCLARIN ERIC
Pages85-89
Number of pages5
Publication statusPublished - 16 Oct 2023

Publication series

NameCLARIN Annual Conference Proceedings
Volume2023
ISSN (Electronic)2773-2177

Keywords

  • Multiword Expressions
  • GrETEL
  • linguistic research infrastructure
  • Dutch
  • treebanks

Fingerprint

Dive into the research topics of 'Finding Dutch multiword expressions'. Together they form a unique fingerprint.

Cite this