TY - GEN
T1 - SimLex-999 for Dutch
AU - Brans, Lizzy
AU - Bloem, Jelke
N1 - Publisher Copyright:
© 2024 ELRA Language Resource Association: CC BY-NC 4.0.
PY - 2024/5
Y1 - 2024/5
N2 - Word embeddings revolutionised natural language processing by effectively representing words as dense vectors. Although many datasets exist to evaluate English embeddings, few cater to Dutch. We developed a Dutch variant of the SimLex-999 word similarity dataset by gathering similarity judgements from 235 native Dutch speakers. Subsequently, we evaluated two popular Dutch language models, Bertje and RobBERT, finding that Bertje showed superior alignment with human semantic similarity judgments compared to RobBERT. This study provides the first intrinsic Dutch word embedding evaluation dataset, which enables accurate assessment of these embeddings and fosters the development of effective Dutch language models.
AB - Word embeddings revolutionised natural language processing by effectively representing words as dense vectors. Although many datasets exist to evaluate English embeddings, few cater to Dutch. We developed a Dutch variant of the SimLex-999 word similarity dataset by gathering similarity judgements from 235 native Dutch speakers. Subsequently, we evaluated two popular Dutch language models, Bertje and RobBERT, finding that Bertje showed superior alignment with human semantic similarity judgments compared to RobBERT. This study provides the first intrinsic Dutch word embedding evaluation dataset, which enables accurate assessment of these embeddings and fosters the development of effective Dutch language models.
UR - http://www.scopus.com/inward/record.url?scp=85195889961&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85195889961
T3 - 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings
SP - 14832
EP - 14845
BT - 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings
A2 - Calzolari, Nicoletta
A2 - Kan, Min-Yen
A2 - Hoste, Veronique
A2 - Lenci, Alessandro
A2 - Sakti, Sakriani
A2 - Xue, Nianwen
PB - European Language Resources Association (ELRA)
T2 - Joint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024
Y2 - 20 May 2024 through 25 May 2024
ER -