Optimizing Quality Estimation for Low-Resource Language Translations: Exploring the Role of Language Relatedness

Abstract

Quality Estimation (QE) is vital to determine the effectiveness of MT systems. This paper investigates QE for machine translation (MT) for low-resource Indic languages. We analyse the influence of language relatedness within linguistic families and integrate various pre-trained encoders within the MonoTransQuest (MonoTQ) framework. This entails assessing models in single-language configurations before scaling up to multiple-language setups, focusing on languages within and across families, and using approaches grounded in transfer learning. Experimental outcomes and analyses indicate that language-relatedness significantly improves QE performance over baseline, sometimes even surpassing state-of-the-art approaches. Across monolingual and multilingual configurations, we discuss strategic encoder usage as a simple measure to exploit the language interactions within these models improving baseline QE efficiency for quality estimation. This investigation underscores the potential of tailored pre-trained encoders to improve QE performance and discusses the limitations of QE approaches for low-resource scenarios.

Publication
Proceedings of the Conference New Trends in Translation and Technology 2024