Evaluating NMT using the non-inferiority principle
Do Campo Bayón, María 
(Universitat Autònoma de Barcelona)
Sánchez-Gijón, Pilar 
(Universitat Autònoma de Barcelona)
| Imprint: |
Cambridge: Cambridge University Press, 2025 |
| Abstract: |
The aim of this article is to propose a new neural machine translation (NMT) evaluation method based on the non-inferiority principle. In order to do that, we evaluate raw machine translation (MT) in terms of naturalness, which for this research is defined as not just the lack of fluency errors but also meeting the linguistic expectations of Galician end users when reading original texts in Galician. Our main objective is, in the first place, to validate the new methodology presented in our previous study by evaluating an NMT engine from Spanish into Galician for the social media domain that was retrained with a new Twitter corpus. This new methodology and NMT engine were applied after analyzing the conclusions of a pilot survey conducted among Twitter users to evaluate their perception of tweets translated from Spanish into Galician with our NMT engine created with a corpus of tweets. As in our preliminary study, our aim is to propose a robust quality approximation method based on the reception parameters of end users' perceptions. This new survey was conducted in December of 2022 with the participation of 228 Galicianspeaking Twitter users. Among the main changes proposed are the inclusion of more information about the participant profile, so the non-inferiority principle can be also evaluated according to these parameters; the inclusion of a new typology of tweets, the threads; the provision of context by means of presenting the tweets in their original display as shown in the Twitter app; a change in the number of tweets evaluated and the number of different questionnaires; the change in the distribution of the questionnaires; and the inclusion of an error classification human evaluation conducted by professional linguists to correlate the findings. We will present the steps carried out following the conclusions of the pilot study, describe the new study's design, analyze the new findings, and present the final conclusions regarding the engine and the evaluation method based on the non-inferiority principle. Finally, we will also provide some examples of the use of this new methodology in the translation industry. |
| Rights: |
Aquest document està subjecte a una llicència d'ús Creative Commons. Es permet la reproducció total o parcial, la distribució, la comunicació pública de l'obra i la creació d'obres derivades, fins i tot amb finalitats comercials, sempre i quan es reconegui l'autoria de l'obra original.  |
| Language: |
Anglès |
| Document: |
Article ; recerca ; Versió publicada |
| Subject: |
Non-inferiority principle ;
Evaluation ;
Machine translation ;
NMT evaluation ;
Naturalness evaluation |
| Published in: |
Natural Language Processing, Vol. 31 Num. 4 (2025) , p. 1042-1061, ISSN 2977-0424 |
DOI: 10.1017/nlp.2024.4
The record appears in these collections:
Articles >
Research articlesArticles >
Published articles
Record created 2025-02-14, last modified 2025-09-07