AraNLP: A Java-based library for the processing of Arabic text

M. Althobaiti, U. Kruschwitz, M. Poesio

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

Abstract

We present a free, Java-based library named “AraNLP” that covers various Arabic text preprocessing tools. Although a good number of tools for processing Arabic text already exist, integration and compatibility problems continually occur. AraNLP is an attempt to gather most of the vital Arabic text preprocessing tools into one library that can be accessed easily by integrating or accurately adapting existing tools and by developing new ones when required. The library includes a sentence detector, tokenizer, light stemmer, root stemmer, part-of speech tagger (POS-tagger), word segmenter, normalizer, and a punctuation and diacritic remover.
Original languageEnglish
Title of host publicationProceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014
PublisherAssociation for Computational Linguistics
Pages4134-4138
Publication statusPublished - 2014

Fingerprint

Dive into the research topics of 'AraNLP: A Java-based library for the processing of Arabic text'. Together they form a unique fingerprint.

Cite this