machine translation | Diptesh Kanojia

Together We Can: Multilingual Automatic Post-Editing for Low-Resource Languages

This exploratory study investigates the potential of multilingual Automatic Post-Editing (APE) systems to enhance the quality of machine translations for low-resource Indo-Aryan languages. Focusing on two closely related language pairs, …

What do Large Language Models Need for Machine Translation Evaluation?

Leveraging large language models (LLMs) for various natural language processing tasks has led to superlative claims about their performance. For the evaluation of machine translation (MT), existing research shows that LLMs are able to achieve results …

Edit Distances and Their Applications to Downstream Tasks in Research and Commercial Contexts

The tutorial describes the concept of edit distances applied to research and commercial contexts. We use Translation Edit Rate (TER), Levenshtein, Damerau-Levenshtein, Longest Common Subsequence and n-gram distances to demonstrate the frailty of …

Optimizing Quality Estimation for Low-Resource Language Translations: Exploring the Role of Language Relatedness

Quality Estimation (QE) is vital to determine the effectiveness of MT systems. This paper investigates QE for machine translation (MT) for low-resource Indic languages. We analyse the influence of language relatedness within linguistic families and …

Evaluating Machine Translation for Emotion-loaded User Generated Content (TransEval4Emo-UGC)

This paper presents a dataset for evaluating the machine translation of emotion-loaded user generated content. It contains human-annotated quality evaluation data and post-edited reference translations. The dataset is available at our GitHub …

Google Translate Error Analysis for Mental Healthcare Information: Evaluating Accuracy, Comprehensibility, and Implications for Multilingual Healthcare Communication

This study explores the use of Google Translate (GT) for translating mental healthcare (MHealth) information and evaluates its accuracy, comprehensibility, and implications for multilingual healthcare communication through analysing GT output in the …

APE-then-QE: Correcting then Filtering Pseudo Parallel Corpora for MT Training Data Creation

Automatic Post-Editing (APE) is the task of automatically identifying and correcting errors in the Machine Translation (MT) outputs. We propose a repair-filter-use methodology that uses an APE system to correct errors on the target side of the MT …

Findings of the WMT 2023 Shared Task on Automatic Post-Editing

We present the results from the 9th round of the WMT shared task on MT Automatic Post-Editing, which consists of automatically correcting the output of a “black-box” machine translation system by learning from human corrections. Like last year, the …

Findings of the WMT 2023 Shared Task on Quality Estimation

We report the results of the WMT 2023 shared task on Quality Estimation, in which the challenge is to predict the quality of the output of neural machine translation systems at the word and sentence levels, without access to reference translations. …

Quality Estimation-Assisted Automatic Post-Editing

Automatic Post-Editing (APE) systems are prone to over-correction of the Machine Translation (MT) outputs. While Word-level Quality Estimation (QE) system can provide a way to curtail the over-correction, a significant performance gain has not been …

SurreyAI 2023 Submission for the Quality Estimation Shared Task

Quality Estimation (QE) systems are important in situations where it is necessary to assess the quality of translations, but there is no reference available. This paper describes the approach adopted by the SurreyAI team for addressing the …

Challenges of Human vs Machine Translation of Emotion-Loaded Chinese Microblog Texts

This paper attempts to identify challenges professional translators face when translating emotion-loaded texts as well as errors machine translation (MT) makes when translating this content. We invited ten Chinese-English translators to translate …