evaluation

Reference-Less Evaluation of Machine Translation: Navigating Through the Resource-Scarce Scenarios

Reference-less evaluation of machine translation, or Quality Estimation (QE), is vital for low-resource language pairs where high-quality references are often unavailable. In this study, we investigate segment-level QE methods comparing encoder-based …

What do Large Language Models Need for Machine Translation Evaluation?

Leveraging large language models (LLMs) for various natural language processing tasks has led to superlative claims about their performance. For the evaluation of machine translation (MT), existing research shows that LLMs are able to achieve results …

Evaluating Machine Translation for Emotion-loaded User Generated Content (TransEval4Emo-UGC)

This paper presents a dataset for evaluating the machine translation of emotion-loaded user generated content. It contains human-annotated quality evaluation data and post-edited reference translations. The dataset is available at our GitHub …

Findings of the WMT 2022 Shared Task on Quality Estimation

We report the results of the WMT 2022 shared task on Quality Estimation, in which the challenge is to predict the quality of the output of neural machine translation systems at the word and sentence levels, without access to reference translations. …

HiNER: A Large Hindi Named Entity Recognition Dataset

Named Entity Recognition (NER) is a foundational NLP task that aims to provide class labels like Person, Location, Organisation, Time, and Number to words in free text. Named Entities can also be multi-word expressions where the additional I-O-B …

PLOD: An Abbreviation Detection Dataset for Scientific Documents

The detection and extraction of abbreviations from unstructured texts can help to improve the performance of Natural Language Processing tasks, such as machine translation and information retrieval. However, in terms of publicly available datasets, …

Pushing the Right Buttons: Adversarial Evaluation of Quality Estimation

Current Machine Translation (MT) systems achieve very good results on a growing variety of language pairs and datasets. However, they are known to produce fluent translation outputs that can contain important meaning errors, thus undermining their …