machine translation

Findings of the WMT25 Shared Task on Automated Translation Evaluation Systems: Linguistic Diversity is Challenging and References Still Help

The WMT25 Shared Task on Automated Translation Evaluation Systems evaluates metrics and quality estimation systems that assess the quality of language translation systems. This task unifies and consolidates the separate WMT shared tasks on Machine …

ALOPE: Adaptive Layer Optimization for Translation Quality Estimation using Large Language Models

Large Language Models (LLMs) have shown remarkable performance across a wide range of natural language processing tasks. Quality Estimation (QE) for Machine Translation (MT), which assesses the quality of a source-MT pair without relying on reference …

Reference-Less Evaluation of Machine Translation: Navigating Through the Resource-Scarce Scenarios

Reference-less evaluation of machine translation, or Quality Estimation (QE), is vital for low-resource language pairs where high-quality references are often unavailable. In this study, we investigate segment-level QE methods comparing encoder-based …

ALOPE: Adaptive Layer Optimization for Translation Quality Estimation using Large Language Models

Large Language Models (LLMs) have shown remarkable performance across a wide range of natural language processing tasks. Quality Estimation (QE) for Machine Translation (MT), which assesses the quality of a source-target pair without relying on …

Prompt-based Explainable Quality Estimation for English-Malayalam

The aim of this project was to curate data for the English-Malayalam language pair for the tasks of Quality Estimation (QE) and Automatic Post-Editing (APE) of Machine Translation. Whilst the primary aim of the project was to create a dataset for a …

Automatically Generating Chinese Homophone Words to Probe Machine Translation Estimation Systems

Evaluating machine translation (MT) of user-generated content (UGC) involves unique challenges such as checking whether the nuance of emotions from the source are preserved in the target text. Recent studies have proposed emotion-related datasets, …

Giving the Old a Fresh Spin: Quality Estimation-Assisted Constrained Decoding for Automatic Post-Editing

Automatic Post-Editing (APE) systems often struggle with over-correction, where unnecessary modifications are made to a translation, diverging from the principle of minimal editing. In this paper, we propose a novel technique to mitigate …

Refer to the Reference: Reference-focused Synthetic Automatic Post-Editing Data Generation

When LLMs Struggle: Reference-less Translation Evaluation for Low-resource Languages

This paper investigates the reference-less evaluation of machine translation for low-resource language pairs, known as quality estimation (QE). Segment-level QE is a challenging cross-lingual language understanding task that provides a quality score …

A Multi-task Learning Framework for Evaluating Machine Translation of Emotion-loaded User-generated Content

Machine translation (MT) of user-generated content (UGC) poses unique challenges, including handling slang, emotion, and literary devices like irony and sarcasm. Evaluating the quality of these translations is challenging as current metrics do not …

Are Large Language Models State-of-the-art Quality Estimators for Machine Translation of User-generated Content?

This paper investigates whether large language models (LLMs) are state-of-the-art quality estimators for machine translation of user-generated content (UGC) that contains emotional expressions, without the use of reference translations. To achieve …

Findings of the Quality Estimation Shared Task at WMT 2024: Are LLMs Closing the Gap in QE?

We report the results of the WMT 2024 shared task on Quality Estimation, in which the challenge is to predict the quality of the output of neural machine translation systems at the word and sentence levels, without access to reference translations. …