Publications

Reference-less evaluation of machine translation, or Quality Estimation (QE), is vital for low-resource language pairs where …

Detecting cyberbullying on social media remains a critical challenge due to its subtle and varied expressions. This study investigates …

Visual metaphor generation is a challenging task that aims to generate an image given an input text metaphor. Inherently, it needs …

Despite large language models (LLMs) being known to exhibit bias against non-mainstream varieties, there are no known labeled datasets …

E-commerce information retrieval (IR) systems struggle to simultaneously achieve high accuracy in interpreting complex user queries and …

The aim of this project was to curate data for the English-Malayalam language pair for the tasks of Quality Estimation (QE) and …

Social media platforms enable the propagation of hateful content across different modalities such as textual, auditory, and visual, …

Evaluating machine translation (MT) of user-generated content (UGC) involves unique challenges such as checking whether the nuance of …

Automatic Post-Editing (APE) systems often struggle with over-correction, where unnecessary modifications are made to a translation, …

Audio-Visual Segmentation (AVS) aims to identify, at the pixel level, the object in a visual scene that produces a given sound. Current …

In music-driven dance motion generation, most existing methods use hand-crafted features and neglect that music foundation models have …

State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance …

This paper investigates the reference-less evaluation of machine translation for low-resource language pairs, known as quality …

Audio-driven talking face generation is a challenging task in digital communication. Despite significant progress in the area, most …

Novel view acoustic synthesis (NVAS) aims to render binaural audio at any target viewpoint, given a mono audio emitted by a sound …

Audio-to-talking face generation stands at the forefront of advancements in generative AI. It bridges the gap between audio and visual …

This paper investigates whether large language models (LLMs) are state-of-the-art quality estimators for machine translation of …

Machine translation (MT) of user-generated content (UGC) poses unique challenges, including handling slang, emotion, and literary …

Leveraging large language models (LLMs) for various natural language processing tasks has led to superlative claims about their …

This exploratory study investigates the potential of multilingual Automatic Post-Editing (APE) systems to enhance the quality of …

This paper addresses the challenge of improving user experience on e-commerce platforms by enhancing product ranking relevant to …

This talk addresses the challenge of improving user experience on e-commerce platforms by enhancing product ranking relevant to …

This paper investigates data sampling strategies to create a benchmark for dialectal sentiment classification of Google Places reviews …

Existing benchmarks often fail to account for linguistic diversity, like language variants of English. In this paper, we share our …

The tutorial describes the concept of edit distances applied to research and commercial contexts. We use Translation Edit Rate (TER), …

Despite excellent results on benchmarks over a small subset of languages, large language models struggle to process text from languages …

Sarcasm is a rhetorical device that is used to convey the opposite of the literal meaning of an utterance. Sarcasm is widely used on …

Quality Estimation (QE) is vital to determine the effectiveness of MT systems. This paper investigates QE for machine translation (MT) …

This paper presents a dataset for evaluating the machine translation of emotion-loaded user generated content. It contains …

Social media, a vast platform for communication and entertainment, unfortunately, is an ideal breeding ground for cyberbullying. While …

Abbreviations and their associated long forms are important textual elements that are present in almost every scientific communication, …

Sound Event Detection (SED) aims to predict the temporal boundaries of all the events of interest and their class labels, given an …

Automatic Post-Editing (APE) is the task of automatically identifying and correcting errors in the Machine Translation (MT) outputs. We …

Quality Estimation (QE) systems are important in situations where it is necessary to assess the quality of translations, but there is …

Automatic Post-Editing (APE) systems are prone to over-correction of the Machine Translation (MT) outputs. While Word-level Quality …

Sarcasm is a complex linguistic construct with incongruity at its very core. Detecting sarcasm depends on the actual content spoken and …

We report the results of the WMT 2023 shared task on Quality Estimation, in which the challenge is to predict the quality of the output …

We present the results from the 9th round of the WMT shared task on MT Automatic Post-Editing, which consists of automatically …

This paper attempts to identify challenges professional translators face when translating emotion-loaded texts as well as errors …

Audio-Visual Segmentation (AVS) aims to precisely outline audible objects in a visual scene at the pixel level. Existing AVS methods …

Cyberbullying is a serious societal issue widespread on various channels and platforms, particularly social networking sites. Such …

Recent years have seen a proliferation of aggressive social media posts, often wreaking even real-world consequences for victims. …

Quality Estimation (QE) is the task of evaluating machine translation output in the absence of reference translation. Conventional …

Sentiment analysis has benefited from the availability of lexicons and benchmark datasets created over decades of research. However, …

Sarcasm is prevalent in all corners of social media, posing many challenges within Natural Language Processing (NLP), particularly for …

We present the results from the 8th round of the WMT shared task on MT Automatic Post-Editing, which consists in automatically …

Social media platforms have become new battlegrounds for anti-social elements, with misinformation being the weapon of choice. …

The detection and extraction of abbreviations from unstructured texts can help to improve the performance of Natural Language …

Named Entity Recognition (NER) is a foundational NLP task that aims to provide class labels like Person, Location, Organisation, Time, …

This paper summarises the submissions our team, SURREY-CTS-NLP has made for the WASSA 2022 Shared Task for the prediction of empathy, …

Acronyms are abbreviated units of a phrase constructed by using initial components of the phrase in a text. Automatic extraction of …

Fake news, misinformation, and unverifiable facts on social media platforms propagate disharmony and affect society, especially when …

Current Machine Translation (MT) systems achieve very good results on a growing variety of language pairs and datasets. However, they …

Computational Humour (CH) has attracted the interest of Natural Language Processing and Computational Linguistics communities. Creating …

Given a noun compound (NC), we address the problem of predicting the appropriate semantic label linking the constituents of the NC. …

Automatic detection of cognates helps downstream NLP tasks of Machine Translation, Cross-lingual Information Retrieval, Computational …

Automatic essay grading (AEG) is a process in which machines assign a grade to an essay written in response to a topic, called the …

Gaze behaviour has been used as a way to gather cognitive information for a number of years. In this paper, we discuss the use of gaze …

Cross-domain sentiment analysis (CDSA) helps to address the problem of data scarcity in scenarios where labelled data for a domain …

Cognates are present in multiple variants of the same text across different languages (e.g., hund in German and hound in English …

Dense word vectors or ‘word embeddings’ which encode semantic properties of words, have now become integral to NLP tasks …

This paper describes additional aspects of a digital tool called the ‘Textual History Tool’. We describe its various salient features …

Establishing language relatedness by inferring phylogenetic trees has been a topic of interest in the area of diachronic linguistics. …

Automatic Cognate Detection helps NLP tasks of Machine Translation, Information Retrieval, and Phylogenetics. Cognate words are defined …

Tracing the root of a text i.e., the original version of the text, by inferring phylogenetic trees has been a topic of interest in …

Automatic Cognate Detection (ACD) is a challenging task which has been utilized to help NLP applications like Machine Translation, …

This paper describes a digital tool called the Textual History Tool in detail. This tool captures the historical evolution of a text …

In today’s digital world language technology has gained importance. Several software, have been developed and are available in the …

Cognates are present in multiple variants of the same text across different languages. Computational Phylogenetics uses algorithms and …

In this paper, we describe our work on the creation of a voice model using a speech synthesis system for the Hindi Language. We use …

Wordnets are rich lexico-semantic resources. Linked wordnets are extensions of wordnets, which link similar concepts in wordnets of …

Indian language WordNets have their individual web-based browsing interfaces along with a common interface for IndoWordNet. These …

A sentence is an important notion in the Indian grammatical tradition. The collection of the definitions of a sentence can be found in …

Wordnets are rich lexico-semantic resources. Linked wordnets are extensions of wordnets, which link similar concepts in wordnets of …

This paper reports the work related to making Hindi Wordnet1 available as a digital resource for language learning and teaching, and …

Predicting a reader’s rating of text quality is a challenging task that involves estimating different subjective aspects of the …

Measuring reading effort is useful for practical purposes such as designing learning material and personalizing text comprehension …

Sarcasm Suite is a browser-based engine that deploys five of our past papers in sarcasm detection and generation. The sarcasm detection …

We present a quantitative, data-driven machine learning approach to mitigate the problem of unpredictability of Computer Science …

Parallel corpora are often injected with bilingual lexical resources for improved Indian language machine translation (MT). In absence …

India is a country with 22 officially recognized languages and 17 of these have WordNets, a crucial resource. Web browser based …

We present a WordNet like structured resource for slang words and neologisms on the internet. The dynamism of language is often an …

Sarcasm understandability or the ability to understand textual sarcasm depends upon readers’ language proficiency, social knowledge, …

This paper reports the work of creating bilingual mappings in English for certain synsets of Hindi wordnet, the need for doing this, …

Sentiments expressed in user-generated short text and sentences are nuanced by subtleties at lexical, syntactic, semantic and pragmatic …

In this paper, we propose a novel mechanism for enriching the feature vector, for the task of sarcasm detection, with cognitive …

We present the Civique system for emergency detection in urban areas by monitoring micro blogs like Tweets. The system detects …

WordNet has proved to be immensely useful for Word Sense Disambiguation, and thence Machine translation, Information Retrieval and …

WordNet is an online lexical resource which expresses unique concepts in a language. English WordNet is the first WordNet which was …

Parallel corpora are often injected with bilingual dictionaries for improved Indian language machine translation (MT). In absence of …

We present TransChat, an open-source, cross platform, Indian language Instant Messaging (IM) application that facilitates cross lingual …

We present our work on developing fifteen Hierarchical Phrase Based Statistical Machine Translation (HPBSMT) systems for five Indian …

We present a Parallel Corpora Management tool that aides parallel corpora generation for the task of Machine Translation (MT). It takes …

The task of Word Sense Disambiguation (WSD) incorporates in its definition the role of ‘context’. We present our work on the …

Word Sense Disambiguation (WSD) approaches have reported good accuracies in recent years. However, these approaches can be classified …

Current state-of-the-art Word Sense Disambiguation (WSD) algorithms are mostly supervised and use the P (Sense|Word) statistic for …

Does context help determine sense? This question might seem frivolous, even preposterous to anybody sensible. However, our long time …