Readability Metrics for Machine Translation in Dutch: Google vs. Azure & IBM

Chaïm van Toledo*, Marijn Schraagen, Friso van Dijk, Matthieu Brinkhuis, Marco Spruit

*Corresponding author for this work

    Research output: Contribution to journalArticleAcademicpeer-review

    Abstract

    This paper introduces a novel method to predict when a Google translation is better than other machine translations (MT) in Dutch. Instead of considering fidelity, this approach considers fluency and readability indicators for when Google ranked best. This research explores an alternative approach in the field of quality estimation. The paper contributes by publishing a dataset with sentences from English to Dutch, with human-made classifications on a best-worst scale. Logistic regression shows a correlation between T-Scan output, such as readability measurements like lemma frequencies, and when Google translation was better than Azure and IBM. The last part of the results section shows the prediction possibilities. First by logistic regression and second by a generated automated machine learning model. Respectively, they have an accuracy of 0.59 and 0.61.
    Original languageEnglish
    Article number4444
    Pages (from-to)1-14
    Number of pages14
    JournalApplied Sciences
    Volume13
    Issue number7
    DOIs
    Publication statusPublished - 1 Apr 2023

    Keywords

    • English to Dutch quality estimation
    • Machine translation
    • Quality estimation
    • Squad 2.0

    Fingerprint

    Dive into the research topics of 'Readability Metrics for Machine Translation in Dutch: Google vs. Azure & IBM'. Together they form a unique fingerprint.

    Cite this