Abstract
The website of the Dutch Police facilitates submitting a crime report, partly consisting of free text. To automate report processing, relation extraction can be used, which in turn requires accurate named entity recognition (NER). However, NER as offered by current Dutch parsers suffers from limited accuracy. Issues with grammaticality and spelling of the crime reports impair the NER even further. The current research aims to evaluate NER results on the crime reports data set using large-scale human judgment. The experiments are in progress, and the first results have been collected. Aspects of this evaluation include assignment of named entity types, recognition of multiword entities, mixed language issues and theoretical considerations on the nature and use of named entities. The evaluation is intended to provide pointers for increasing NER accuracy on this type of data.
Original language | English |
---|---|
Publication status | Published - 10 Feb 2017 |
Event | CLIN 2017 - KU Leuven, Leuven, Belgium Duration: 10 Feb 2017 → 10 Feb 2017 |
Conference
Conference | CLIN 2017 |
---|---|
Country/Territory | Belgium |
City | Leuven |
Period | 10/02/17 → 10/02/17 |
Keywords
- named entity recognition
- evaluation
- spelling errors
- free text entry
- crime reports