Readability measures for machine translation in Dutch: Google vs Azure & IBM

Chaïm van Toledo, Matthieu Brinkhuis and Marco Spruit

This paper provides a novel method to predict when a Google translation is better than other machine translations (MT) in Dutch. Instead of taking fidelity into account, this method takes fluency and readability into account. The proposed method uses readability measurements like word probability from the T-Scan tool to determine when Google MT performed better than two other MT providers. Evaluation is done by classifying 213 translated sentences from Azure, IBM and Google in a best-worst scale and picking the best translation. A logistic regression shows a correlation between T-Scan output and when Google translation was better than Azure and IBM. The machine learning model has an accuracy of 0.70.