Arda Tezcan
Quality estimation (QE) is defined as the task of predicting the quality of a machine translation (MT) output without any access to correct (reference) translations or human intervention. While earlier work on QE mostly focuses on machine learning methods that rely on linguistic processing and feature engineering, state-of-the-art QE systems use end-to-end, neural-based architectures.
In sentence-level QE, neural-based predictive models commonly rely on the information provided by source-MT pairs to assess the quality of a given MT output, as the quality of the MT output is estimated in terms of its correctness compared to the given source text. In two recent studies, it has been shown that domain-specific NMT systems, which are trained on source-target sentence pairs, produce translations with higher quality when highly similar sentences to the input sentence (i.e. high fuzzy matches) are present in the data used for training the same NMT systems. These findings suggest that, in the context of domain-specific NMT, the degree of similarity between the input sentence and the NMT training data plays an important role in the quality of translations produced by the same system. In a general-domain scenario, this link has also recently been explored in the form of integrating a set of features into a neural-based QE architecture based on the highest similarity scores measured between input sentences and the NMT training data. However, when combined with other linguistic features, these similarity-based features led to a decrease in QE performance.
With the hypothesis that fuzzy matches (FMs) are informative about the quality of the NMT output for a given input sentence when highly similar source sentences can be found the data used for training the NMT system, the current study explores their usefulness for sentence-level QE in the context of domain-specific NMT. To test this hypothesis, experiments have been performed using TransQuest, an open-source QE framework, which uses the transfer-learning approach to fine-tune the pre-trained XLM-Roberta (large) model. Instead of adapting the QE architecture to integrate this information in the form of linguistic features, this study proposes a simpler approach and integrates FMs with the highest similarity score directly into the input representations by means of concatenating them to the source-MT pairs during training and test phases.
Preliminary results obtained on the English-Dutch language pair and a single domain not only show that this simple method improves sentence-level QE performance significantly compared to the baseline system which utilizes source-MT pairs, but also reveal that, by relying only on the information provided by source-FM pairs (i.e. in the absence of MT output), similar QE performance compared to the baseline system can be achieved. Moreover, experiments performed in a general-domain scenario confirm previous findings, namely that this approach leads, in fact, to a decrease in QE performance. Currently, additional experiments are being performed to extend the scope of this study into a different language pair and domain.