Abstract
We present MWE-Finder, which enables a user to search for occurrences of multiword expressions (MWEs) in large Dutch text corpora. Components of many MWEs in Dutch can occur in
multiple forms, need not be adjacent, and can occur in multiple orders (such MWEs are called
flexible). Searching for occurrences of such flexible MWEs is difficult and cannot be done reliably with most search applications. What is needed is a search engine that takes into account
the grammatical configuration of the MWE. MWE-Finder is therefore embedded in GrETEL,
a treebank search application for Dutch. A user can enter an example of a MWE in a specific
canonical form, after which the system searches for sentences in which the MWE occurs, using
queries generated automatically from the canonical form. The MWE can also be selected from a
list of more than 11k canonical forms for Dutch MWEs that MWE-Finder offers. We will show
that MWE-Finder also offers facilities to find examples with unexpected modifiers or determiners
on components of the MWE
multiple forms, need not be adjacent, and can occur in multiple orders (such MWEs are called
flexible). Searching for occurrences of such flexible MWEs is difficult and cannot be done reliably with most search applications. What is needed is a search engine that takes into account
the grammatical configuration of the MWE. MWE-Finder is therefore embedded in GrETEL,
a treebank search application for Dutch. A user can enter an example of a MWE in a specific
canonical form, after which the system searches for sentences in which the MWE occurs, using
queries generated automatically from the canonical form. The MWE can also be selected from a
list of more than 11k canonical forms for Dutch MWEs that MWE-Finder offers. We will show
that MWE-Finder also offers facilities to find examples with unexpected modifiers or determiners
on components of the MWE
Original language | English |
---|---|
Title of host publication | CLARIN Annual Conference Proceedings 2023 |
Editors | Krister Lindén, Jyrki Niemi, Thalassia Kontino |
Place of Publication | Utrecht |
Publisher | CLARIN ERIC |
Pages | 85-89 |
Number of pages | 5 |
Publication status | Published - 16 Oct 2023 |
Publication series
Name | CLARIN Annual Conference Proceedings |
---|---|
Volume | 2023 |
ISSN (Electronic) | 2773-2177 |
Keywords
- Multiword Expressions
- GrETEL
- linguistic research infrastructure
- Dutch
- treebanks