DutchSemCor: Targeting the ideal sense-tagged corpus

P. Vossen, A. Gorog, R. Izquierdo, A. Van den Bosch

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

Abstract

Word Sense Disambiguation (WSD) systems require large sense-tagged corpora along with lexical databases to reach satisfactory results. The number of English language resources for developed WSD increased in the past years while most other languages are still under-resourced. The situation is no different for Dutch. In order to overcome this data bottleneck, the DutchSemCor project will deliver a Dutch corpus that is sense-tagged with senses from the Cornetto lexical database. In this paper, we discuss the different conflicting requirements for a sense-tagged corpus and our strategies to fulfill them. We report on a first series of experiments to sup- port our semi-automatic approach to build the corpus.
Original languageEnglish
Title of host publicationProceedings of the Eighth International Conference on Language Resources and Evaluation
PublisherAssociation for Computational Linguistics (ACL)
Pages584-589
Publication statusPublished - 2012
Externally publishedYes

Fingerprint

Dive into the research topics of 'DutchSemCor: Targeting the ideal sense-tagged corpus'. Together they form a unique fingerprint.

Cite this