Diptesh Kanojia
दिप्तेश कनोजिया
Home
Experience
Research
अनुसंधान
Talks
Blog
पोस्ट
Contact
Publications
Type
Conference paper
Journal article
Preprint
Book section
Date
2025
2024
2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
Findings of the WMT25 Shared Task on Automated Translation Evaluation Systems: Linguistic Diversity is Challenging and References Still Help
The WMT25 Shared Task on Automated Translation Evaluation Systems evaluates metrics and quality estimation systems that assess the …
Alon Lavie
,
Greg Hanneman
,
Sweta Agrawal
,
Diptesh Kanojia
,
Chi-kiu Lo
,
Vilém Zouhar
,
Frédéric Blain
,
Chrysoula Zerva
,
Eleftherios Avramidis
,
Sourabh Dattatray Deoghare
,
Archchana Sindhujan
,
Jiayi Wang
,
David Ifeoluwa Adelani
,
Brian Thompson
,
Tom Kocmi
,
Markus Freitag
,
Daniel Deutsch
PDF
Cite
Reference-Less Evaluation of Machine Translation: Navigating Through the Resource-Scarce Scenarios
Reference-less evaluation of machine translation, or Quality Estimation (QE), is vital for low-resource language pairs where …
Archchana Sindhujan
,
Diptesh Kanojia
,
Constantin Orăsan
PDF
Cite
ALOPE: Adaptive Layer Optimization for Translation Quality Estimation using Large Language Models
Large Language Models (LLMs) have shown remarkable performance across a wide range of natural language processing tasks. Quality …
Archchana Sindhujan
,
Shenbin Qian
,
Chan Chi Chun Matthew
,
Constantin Orăsan
,
Diptesh Kanojia
Preprint
PDF
Cite
Code
Dataset
Poster
Slides
Cyberbullying Detection via Aggression-Enhanced Prompting
Detecting cyberbullying on social media remains a critical challenge due to its subtle and varied expressions. This study investigates …
Aisha Saeid
,
Anu Sabu
,
Girish A Koushik
,
Ferrante Neri
,
Diptesh Kanojia
Preprint
PDF
Cite
Poster
The Mind's Eye: A Multi-Faceted Reward Framework for Guiding Visual Metaphor Generation
Visual metaphor generation is a challenging task that aims to generate an image given an input text metaphor. Inherently, it needs …
Girish A Koushik
,
Fatemeh Nazarieh
,
Katherine Birch
,
Shenbin Qian
,
Diptesh Kanojia
Preprint
PDF
Cite
BESSTIE: A Benchmark for Sentiment and Sarcasm Classification for Varieties of English
Despite large language models (LLMs) being known to exhibit bias against non-mainstream varieties, there are no known labeled datasets …
Dipankar Srirag
,
Aditya Joshi
,
Jordan Painter
,
Diptesh Kanojia
PDF
Cite
Code
Dataset
Poster
NEAR²: A Nested Embedding Approach to Efficient Product Retrieval and Ranking
E-commerce information retrieval (IR) systems struggle to simultaneously achieve high accuracy in interpreting complex user queries and …
Shenbin Qian
,
Diptesh Kanojia
,
Samarth Agrawal
,
Hadeel Saadany
,
Swapnil Bhosale
,
Constantin Orăsan
,
Zhe Wu
Preprint
PDF
Cite
Poster
Prompt-based Explainable Quality Estimation for English-Malayalam
The aim of this project was to curate data for the English-Malayalam language pair for the tasks of Quality Estimation (QE) and …
Archchana Sindhujan
,
Diptesh Kanojia
,
Constantin Orăsan
PDF
Cite
Towards a Robust Framework for Multimodal Hate Detection: A Study on Video vs. Image-based Content
Social media platforms enable the propagation of hateful content across different modalities such as textual, auditory, and visual, …
Girish A Koushik
,
Diptesh Kanojia
,
Helen Treharne
PDF
Cite
Code
Slides
Video
Automatically Generating Chinese Homophone Words to Probe Machine Translation Estimation Systems
Evaluating machine translation (MT) of user-generated content (UGC) involves unique challenges such as checking whether the nuance of …
Shenbin Qian
,
Constantin Orăsan
,
Diptesh Kanojia
,
Félix Do Carmo
Preprint
PDF
Cite
Code
Giving the Old a Fresh Spin: Quality Estimation-Assisted Constrained Decoding for Automatic Post-Editing
Automatic Post-Editing (APE) systems often struggle with over-correction, where unnecessary modifications are made to a translation, …
Sourabh Deoghare
,
Diptesh Kanojia
,
Pushpak Bhattacharyya
Preprint
PDF
Cite
CAMU: Context Augmentation for Meme Understanding
Girish A Koushik
,
Diptesh Kanojia
,
Helen Treharne
,
Aditya Joshi
Preprint
PDF
Cite
Unsupervised Audio-Visual Segmentation with Modality Alignment
Audio-Visual Segmentation (AVS) aims to identify, at the pixel level, the object in a visual scene that produces a given sound. Current …
Swapnil Bhosale
,
Haosen Yang
,
Diptesh Kanojia
,
Jiankang Deng
,
Xiatian Zhu
Preprint
PDF
Cite
GCDance: Genre-Controlled 3D Full Body Dance Generation Driven By Music
Xinran Liu
,
Xu Dong
,
Diptesh Kanojia
,
Wenwu Wang
,
Zhenhua Feng
Preprint
PDF
Cite
DGFM: Full Body Dance Generation Driven by Music Foundation Models
In music-driven dance motion generation, most existing methods use hand-crafted features and neglect that music foundation models have …
Xinran Liu
,
Zhenhua Feng
,
Diptesh Kanojia
,
Wenwu Wang
Preprint
PDF
Cite
Natural Language Processing for Dialects of a Language: A Survey
State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance …
Aditya Joshi
,
Raj Dabre
,
Diptesh Kanojia
,
Zhuang Li
,
Haolan Zhan
,
Gholamreza Haffari
,
Doris Dippold
Preprint
PDF
Cite
When LLMs Struggle: Reference-less Translation Evaluation for Low-resource Languages
This paper investigates the reference-less evaluation of machine translation for low-resource language pairs, known as quality …
Archchana Sindhujan
,
Diptesh Kanojia
,
Constantin Orăsan
,
Shenbin Qian
PDF
Cite
Dataset
Refer to the Reference: Reference-focused Synthetic Automatic Post-Editing Data Generation
Sourabh Dattatray Deoghare
,
Diptesh Kanojia
,
Pushpak Bhattacharyya
Cite
PortraitTalk: Towards Customizable One-Shot Audio-to-Talking Face Generation
Audio-driven talking face generation is a challenging task in digital communication. Despite significant progress in the area, most …
Fatemeh Nazarieh
,
Zhenhua Feng
,
Diptesh Kanojia
,
Muhammad Awais
,
Josef Kittler
Preprint
PDF
Cite
AV-GS: Learning Material and Geometry aware Priors for Novel View Acoustic Synthesis
Novel view acoustic synthesis (NVAS) aims to render binaural audio at any target viewpoint, given a mono audio emitted by a sound …
Swapnil Bhosale
,
Haosen Yang
,
Diptesh Kanojia
,
Jiankang Deng
,
Xiatian Zhu
Preprint
PDF
Cite
Poster
Slides
StableTalk: Advancing Audio-to-Talking Face Generation with Stable Diffusion and Vision Transformer
Audio-to-talking face generation stands at the forefront of advancements in generative AI. It bridges the gap between audio and visual …
Fatemeh Nazarieh
,
Josef Kittler
,
Muhammad Awais Rana
,
Diptesh Kanojia
,
Zhenhua Feng
PDF
Cite
Findings of the Quality Estimation Shared Task at WMT 2024: Are LLMs Closing the Gap in QE?
We report the results of the WMT 2024 shared task on Quality Estimation, in which the challenge is to predict the quality of the output …
Chrysoula Zerva
,
Frederic Blain
,
José G. C. De Souza
,
Diptesh Kanojia
,
Sourabh Deoghare
,
Nuno M. Guerreiro
,
Giuseppe Attanasio
,
Ricardo Rei
,
Constantin Orăsan
,
Matteo Negri
,
Marco Turchi
,
Rajen Chatterjee
,
Pushpak Bhattacharyya
,
Markus Freitag
,
André Martins
PDF
Cite
Are Large Language Models State-of-the-art Quality Estimators for Machine Translation of User-generated Content?
This paper investigates whether large language models (LLMs) are state-of-the-art quality estimators for machine translation of …
Shenbin Qian
,
Constantin Orăsan
,
Diptesh Kanojia
,
Félix Do Carmo
Preprint
PDF
Cite
Code
Dataset
A Multi-task Learning Framework for Evaluating Machine Translation of Emotion-loaded User-generated Content
Machine translation (MT) of user-generated content (UGC) poses unique challenges, including handling slang, emotion, and literary …
Shenbin Qian
,
Constantin Orăsan
,
Diptesh Kanojia
,
Félix Do Carmo
Preprint
PDF
Cite
Code
Dataset
What do Large Language Models Need for Machine Translation Evaluation?
Leveraging large language models (LLMs) for various natural language processing tasks has led to superlative claims about their …
Shenbin Qian
,
Archchana Sindhujan
,
Minnie Kabra
,
Diptesh Kanojia
,
Constantin Orăsan
,
Tharindu Ranasinghe
,
Fred Blain
Preprint
PDF
Cite
Code
Together We Can: Multilingual Automatic Post-Editing for Low-Resource Languages
This exploratory study investigates the potential of multilingual Automatic Post-Editing (APE) systems to enhance the quality of …
Sourabh Deoghare
,
Diptesh Kanojia
,
Pushpak Bhattacharyya
Preprint
PDF
Cite
Code
Centrality-aware Product Retrieval and Ranking
This paper addresses the challenge of improving user experience on e-commerce platforms by enhancing product ranking relevant to …
Hadeel Saadany
,
Swapnil Bhosale
,
Samarth Agrawal
,
Diptesh Kanojia
,
Constantin Orăsan
,
Zhe Wu
PDF
Cite
Poster
Slides
Product Retrieval and Ranking for Alphanumeric Queries
This talk addresses the challenge of improving user experience on e-commerce platforms by enhancing product ranking relevant to …
Hadeel Saadany
,
Swapnil Bhosale
,
Samarth Agrawal
,
Zhe Wu
,
Constantin Orăsan
,
Diptesh Kanojia
PDF
Cite
Sampling Strategies for Creation of a Benchmark for Dialectal Sentiment Classification
This paper investigates data sampling strategies to create a benchmark for dialectal sentiment classification of Google Places reviews …
Dipankar Srirag
,
Jordan Painter
,
Aditya Joshi
,
Diptesh Kanojia
Preprint
PDF
Cite
Experiences from Creating a Benchmark for Sentiment Classification for Varieties of English
Existing benchmarks often fail to account for linguistic diversity, like language variants of English. In this paper, we share our …
Dipankar Srirag
,
Jordan Painter
,
Aditya Joshi
,
Diptesh Kanojia
Preprint
PDF
Cite
Edit Distances and Their Applications to Downstream Tasks in Research and Commercial Contexts
The tutorial describes the concept of edit distances applied to research and commercial contexts. We use Translation Edit Rate (TER), …
Félix do Carmo
,
Diptesh Kanojia
Preprint
PDF
Cite
Code
Slides
Connecting Ideas in 'Lower-Resource' Scenarios: NLP for National Varieties, Creoles and Other Low-resource Scenarios
Despite excellent results on benchmarks over a small subset of languages, large language models struggle to process text from languages …
Aditya Joshi
,
Diptesh Kanojia
,
Heather Lent
,
Hour Kaing
,
Haiyue Song
Preprint
PDF
Cite
Slides
A Survey of Multimodal Sarcasm Detection
Sarcasm is a rhetorical device that is used to convey the opposite of the literal meaning of an utterance. Sarcasm is widely used on …
Shafkat Farabi
,
Tharindu Ranasinghe
,
Diptesh Kanojia
,
Yu Kong
,
Marcos Zampieri
Preprint
PDF
Cite
Optimizing Quality Estimation for Low-Resource Language Translations: Exploring the Role of Language Relatedness
Quality Estimation (QE) is vital to determine the effectiveness of MT systems. This paper investigates QE for machine translation (MT) …
Archchana Sindhujan
,
Diptesh Kanojia
,
Constantin Orăsan
PDF
Cite
Evaluating Machine Translation for Emotion-loaded User Generated Content (TransEval4Emo-UGC)
This paper presents a dataset for evaluating the machine translation of emotion-loaded user generated content. It contains …
Shenbin Qian
,
Constantin Orăsan
,
Félix Do Carmo
,
Diptesh Kanojia
PDF
Cite
Code
Dataset
Decoding Cyberbullying on Social Media: A Machine Learning Exploration
Social media, a vast platform for communication and entertainment, unfortunately, is an ideal breeding ground for cyberbullying. While …
Aisha Saeid
,
Diptesh Kanojia
,
Ferrante Neri
PDF
Cite
Using character-level models for efficient abbreviation and long-form detection
Abbreviations and their associated long forms are important textual elements that are present in almost every scientific communication, …
Leonardo Zilio
,
Shenbin Qian
,
Diptesh Kanojia
,
Constantin Orăsan
PDF
Cite
Code
Dataset
DiffSED: Sound Event Detection with Denoising Diffusion
Sound Event Detection (SED) aims to predict the temporal boundaries of all the events of interest and their class labels, given an …
Swapnil Bhosale
,
Sauradip Nag
,
Diptesh Kanojia
,
Jiankang Deng
,
Xiatian Zhu
PDF
Cite
Code
Poster
Slides
Google Translate Error Analysis for Mental Healthcare Information: Evaluating Accuracy, Comprehensibility, and Implications for Multilingual Healthcare Communication
This study explores the use of Google Translate (GT) for translating mental healthcare (MHealth) information and evaluates its …
Jaleh Delfani
,
Constantin Orăsan
,
Hadeel Saadany
,
Ozlem Temizoz
,
Eleanor Taylor-Stilgoe
,
Diptesh Kanojia
,
Sabine Braun
,
Barbara Schouten
Preprint
PDF
Cite
Airavata: Introducing Hindi Instruction-tuned LLM
Jay Gala
,
Thanmay Jayakumar
,
Jaavid Aktar Husain
,
Mohammed Safi Ur Rahman Khan
,
Diptesh Kanojia
,
Ratish Puduppully
,
Mitesh M Khapra
,
Raj Dabre
,
Rudra Murthy
,
Anoop Kunchukuttan
Preprint
PDF
Cite
Code
Dataset
CreoleVal: Multilingual Multitask Benchmarks for Creoles
Creoles represent an under-explored and marginalized group of languages, with few available resources for NLP research. While the …
Heather Lent
,
Kushal Tatariya
,
Raj Dabre
,
Yiyi Chen
,
Marcell Fekete
,
Esther Ploeger
,
Li Zhou
,
Ruth-Ann Armstrong
,
Abee Eijansantos
,
Catriona Malau
,
Hans Erik Heje
,
Ernests Lavrinovics
,
Diptesh Kanojia
,
Paul Belony
,
Marcel Bollmann
,
Loïc Grobol
,
Miryam de Lhoneux
,
Daniel Hershcovich
,
Michel DeGraff
,
Anders Søgaard
,
Johannes Bjerva
PDF
Cite
APE-then-QE: Correcting then Filtering Pseudo Parallel Corpora for MT Training Data Creation
Automatic Post-Editing (APE) is the task of automatically identifying and correcting errors in the Machine Translation (MT) outputs. We …
Akshay Batheja
,
Sourabh Deoghare
,
Diptesh Kanojia
,
Pushpak Bhattacharyya
Preprint
PDF
Cite
SurreyAI 2023 Submission for the Quality Estimation Shared Task
Quality Estimation (QE) systems are important in situations where it is necessary to assess the quality of translations, but there is …
Archchana Sindhujan
,
Diptesh Kanojia
,
Constantin Orăsan
,
Tharindu Ranasinghe
PDF
Cite
Quality Estimation-Assisted Automatic Post-Editing
Automatic Post-Editing (APE) systems are prone to over-correction of the Machine Translation (MT) outputs. While Word-level Quality …
Sourabh Deoghare
,
Diptesh Kanojia
,
Fred Blain
,
Tharindu Ranasinghe
,
Pushpak Bhattacharyya
PDF
Cite
Predict and Use: Harnessing Predicted Gaze to Improve Multimodal Sarcasm Detection
Sarcasm is a complex linguistic construct with incongruity at its very core. Detecting sarcasm depends on the actual content spoken and …
Divyank Tiwari
,
Diptesh Kanojia
,
Anupama Ray
,
Apoorva Nunna
,
Pushpak Bhattacharyya
PDF
Cite
Video
Findings of the WMT 2023 Shared Task on Quality Estimation
We report the results of the WMT 2023 shared task on Quality Estimation, in which the challenge is to predict the quality of the output …
Frederic Blain
,
Chrysoula Zerva
,
Ricardo Rei
,
Nuno M. Guerreiro
,
Diptesh Kanojia
,
José G. C. de Souza
,
Beatriz Silva
,
Tânia Vaz
,
Yan Jingxuan
,
Fatemeh Azadi
,
Constantin Orăsan
,
André Martins
PDF
Cite
Findings of the WMT 2023 Shared Task on Automatic Post-Editing
We present the results from the 9th round of the WMT shared task on MT Automatic Post-Editing, which consists of automatically …
Pushpak Bhattacharyya
,
Rajen Chatterjee
,
Markus Freitag
,
Diptesh Kanojia
,
Matteo Negri
,
Marco Turchi
PDF
Cite
Sarcasm in Sight and Sound: Benchmarking and Expansion to Improve Multimodal Sarcasm Detection
The introduction of the MUStARD dataset, and its emotion recognition extension MUStARD++, have identified sarcasm to be a multi-modal …
Swapnil Bhosale
,
Abhra Chaudhuri
,
Alex Lee Robert Williams
,
Divyank Tiwari
,
Anjan Dutta
,
Xiatian Zhu
,
Pushpak Bhattacharyya
,
Diptesh Kanojia
Preprint
PDF
Cite
Code
Dataset
Challenges of Human vs Machine Translation of Emotion-Loaded Chinese Microblog Texts
This paper attempts to identify challenges professional translators face when translating emotion-loaded texts as well as errors …
Shenbin Qian
,
Constantin Orăsan
,
Félix do Carmo
,
Diptesh Kanojia
PDF
Cite
Leveraging Foundation Models for Unsupervised Audio-Visual Segmentation
Audio-Visual Segmentation (AVS) aims to precisely outline audible objects in a visual scene at the pixel level. Existing AVS methods …
Swapnil Bhosale
,
Haosen Yang
,
Diptesh Kanojia
,
Xiatian Zhu
Preprint
PDF
Cite
Towards Safer Communities: Detecting Aggression and Offensive Language in Code-Mixed Tweets to Combat Cyberbullying
Cyberbullying is a serious societal issue widespread on various channels and platforms, particularly social networking sites. Such …
Nazia Nafis
,
Diptesh Kanojia
,
Naveen Saini
,
Rudra Murthy
PDF
Cite
Modelling Political Aggression on Social Media Platforms
Recent years have seen a proliferation of aggressive social media posts, often wreaking even real-world consequences for victims. …
Akash Rawat
,
Nazia Nafis
,
Dnyaneshwar Bhadane
,
Diptesh Kanojia
,
Rudra Murthy
PDF
Cite
Video
A Multi-task Learning Framework for Quality Estimation
Quality Estimation (QE) is the task of evaluating machine translation output in the absence of reference translation. Conventional …
Sourabh Deoghare
,
Paramveer Choudhary
,
Diptesh Kanojia
,
Tharindu Ranasinghe
,
Pushpak Bhattacharyya
,
Constantin Orăsan
PDF
Cite
Video
Evaluation of Chinese-English Machine Translation of Emotion-Loaded Microblog Texts: A Human Annotated Dataset for the Quality Assessment of Emotion Translation
In this paper, we focus on how current Machine Translation (MT) engines perform on the translation of emotion-loaded texts by …
Shenbin Qian
,
Constantin Orăsan
,
Felix Do Carmo
,
Qiuliang Li
,
Diptesh Kanojia
Preprint
PDF
Cite
Applications and Challenges of Sentiment Analysis in Real-life Scenarios
Sentiment analysis has benefited from the availability of lexicons and benchmark datasets created over decades of research. However, …
Diptesh Kanojia
,
Aditya Joshi
PDF
Cite
Utilizing Weak Supervision to Create S3D: A Sarcasm Annotated Dataset
Sarcasm is prevalent in all corners of social media, posing many challenges within Natural Language Processing (NLP), particularly for …
Jordan Painter
,
Helen Treharne
,
Diptesh Kanojia
PDF
Code
Dataset
Slides
Findings of the WMT 2022 Shared Task on Quality Estimation
We report the results of the WMT 2022 shared task on Quality Estimation, in which the challenge is to predict the quality of the output …
Chrysoula Zerva
,
Frédéric Blain
,
Ricardo Rei
,
Piyawat Lertvittayakumjorn
,
José G. C. de Souza
,
Steffen Eger
,
Diptesh Kanojia
,
Duarte Alves
,
Constantin Orăsan
,
Marina Fomicheva
,
André F. T. Martins
,
Lucia Specia
PDF
Dataset
Slides
Source Document
Findings of the WMT 2022 Shared Task on Automatic Post-Editing
We present the results from the 8th round of the WMT shared task on MT Automatic Post-Editing, which consists in automatically …
Pushpak Bhattacharyya
,
Rajen Chatterjee
,
Markus Freitag
,
Diptesh Kanojia
,
Matteo Negri
,
Marco Turchi
PDF
Code
Dataset
Slides
Harnessing Abstractive Summarization for Fact-Checked Claim Detection
Social media platforms have become new battlegrounds for anti-social elements, with misinformation being the weapon of choice. …
Varad Bhatnagar
,
Diptesh Kanojia
,
Kameswari Chebrolu
Preprint
PDF
Cite
Code
Dataset
Slides
Video
PLOD: An Abbreviation Detection Dataset for Scientific Documents
The detection and extraction of abbreviations from unstructured texts can help to improve the performance of Natural Language …
Leonardo Zilio
,
Hadeel Saadany
,
Prashant Sharma
,
Diptesh Kanojia
,
Constantin Orăsan
Preprint
PDF
Cite
Code
Dataset
Slides
Video
HiNER: A Large Hindi Named Entity Recognition Dataset
Named Entity Recognition (NER) is a foundational NLP task that aims to provide class labels like Person, Location, Organisation, Time, …
Rudra Murthy
,
Pallab Bhattacharjee
,
Rahul Sharnagat
,
Jyotsana Khatri
,
Diptesh Kanojia
,
Pushpak Bhattacharyya
Preprint
PDF
Cite
Code
Dataset
Poster
Video
SURREY-CTS-NLP at WASSA2022: An Experiment of Discourse and Sentiment Analysis for the Prediction of Empathy, Distress and Emotion
This paper summarises the submissions our team, SURREY-CTS-NLP has made for the WASSA 2022 Shared Task for the prediction of empathy, …
Shenbin Qian
,
Constantin Orăsan
,
Diptesh Kanojia
,
Hadeel Saadany
,
Félix Do Carmo
PDF
Cite
An Ensemble Approach to Acronym Extraction using Transformers
Acronyms are abbreviated units of a phrase constructed by using initial components of the phrase in a text. Automatic extraction of …
Prashant Sharma
,
Hadeel Saadany
,
Leonardo Zilio
,
Diptesh Kanojia
,
Constantin Orăsan
Preprint
PDF
Cite
Code
Automated Evidence Collection for Fake News Detection
Fake news, misinformation, and unverifiable facts on social media platforms propagate disharmony and affect society, especially when …
Mrinal Rawat
,
Diptesh Kanojia
Preprint
PDF
Cite
Code
Dataset
Slides
Video
Pushing the Right Buttons: Adversarial Evaluation of Quality Estimation
Current Machine Translation (MT) systems achieve very good results on a growing variety of language pairs and datasets. However, they …
Diptesh Kanojia
,
Marina Fomicheva
,
Tharindu Ranasinghe
,
Frédéric Blain
,
Constantin Orăsan
,
Lucia Specia
Preprint
PDF
Cite
Code
Dataset
Slides
Video
'So You Think You’re Funny?': Rating the Humour Quotient in Standup Comedy
Computational Humour (CH) has attracted the interest of Natural Language Processing and Computational Linguistics communities. Creating …
Anirudh Mittal
,
Pranav Jeevan
,
Prerak Gandhi
,
Diptesh Kanojia
,
Pushpak Bhattacharyya
Preprint
PDF
Cite
Code
Dataset
Poster
Slides
Video
FrameNet-assisted Noun Compound Interpretation
Given a noun compound (NC), we address the problem of predicting the appropriate semantic label linking the constituents of the NC. …
Girishkumar Ponkiya
,
Diptesh Kanojia
,
Pushpak Bhattacharyya
,
Girish Palshikar
PDF
Cite
Dataset
Cognition-aware Cognate Detection
Automatic detection of cognates helps downstream NLP tasks of Machine Translation, Cross-lingual Information Retrieval, Computational …
Diptesh Kanojia
,
Prashant Sharma
,
Sayali Ghodekar
,
Pushpak Bhattacharyya
,
Gholamreza Haffari
,
Malhar Kulkarni
Preprint
PDF
Cite
Code
Poster
Slides
Video
Harnessing Cross-lingual Features to Improve Cognate Detection for Low-resource Languages
Cognates are variants of the same lexical form across different languages; for example ‘fonema’ in Spanish and …
Diptesh Kanojia
,
Raj Dabre
,
Shubham Dewangan
,
Pushpak Bhattacharyya
,
Gholamreza Haffari
,
Malhar Kulkarni
Preprint
PDF
Cite
Dataset
Slides
Video
Happy Are Those Who Grade without Seeing: A Multi-Task Learning Approach to Grade Essays Using Gaze Behaviour
The gaze behaviour of a reader is helpful in solving several NLP tasks such as automatic essay grading. However, collecting gaze …
Sandeep Mathias
,
Rudra Murthy
,
Diptesh Kanojia
,
Abhijit Mishra
,
Pushpak Bhattacharyya
Preprint
PDF
Cite
Dataset
Video
Cognitively Aided Zero-Shot Automatic Essay Grading
Automatic essay grading (AEG) is a process in which machines assign a grade to an essay written in response to a topic, called the …
Sandeep Mathias
,
Rudra Murthy
,
Diptesh Kanojia
,
Pushpak Bhattacharyya
Preprint
PDF
Cite
Dataset
Slides
A Survey on Using Gaze Behaviour for Natural Language Processing
Gaze behaviour has been used as a way to gather cognitive information for a number of years. In this paper, we discuss the use of gaze …
Sandeep Mathias
,
Diptesh Kanojia
,
Abhijit Mishra
,
Pushpak Bhattacharyya
Preprint
PDF
Cite
Poster
Slides
Video
Recommendation Chart of Domains for Cross-Domain Sentiment Analysis: Findings of A 20 Domain Study
Cross-domain sentiment analysis (CDSA) helps to address the problem of data scarcity in scenarios where labelled data for a domain …
Akash Sheoran
,
Diptesh Kanojia
,
Aditya Joshi
,
Pushpak Bhattacharyya
Preprint
PDF
Cite
Dataset
Challenge Datasets of Cognate and False Friend Pairs for Indian Languages
Cognates are present in multiple variants of the same text across different languages (e.g., hund in German and hound in English …
Diptesh Kanojia
,
Pushpak Bhattacharyya
,
Malhar Kulkarni
,
Gholamreza Haffari
Preprint
PDF
Cite
Dataset
"A Passage to India": Pre-trained Word Embeddings for Indian Languages
Dense word vectors or ‘word embeddings’ which encode semantic properties of words, have now become integral to NLP tasks …
Kumar Saurav
,
Kumar Saunack
,
Diptesh Kanojia
,
Pushpak Bhattacharyya
Preprint
PDF
Cite
Code
Dataset
Strategies of Effective Digitization of Commentaries and Sub-commentaries: Towards the Construction of Textual History
This paper describes additional aspects of a digital tool called the ‘Textual History Tool’. We describe its various salient features …
Diptesh Kanojia
,
Malhar Kulkarni
,
Sayali Ghodekar
,
Eivind Kahrs
,
Pushpak Bhattacharyya
Preprint
PDF
Cite
Project
Slides
Harnessing Deep Cross-lingual Word Embeddings to Infer Accurate Phylogenetic Trees
Establishing language relatedness by inferring phylogenetic trees has been a topic of interest in the area of diachronic linguistics. …
Yashasvi Mantha
,
Diptesh Kanojia
,
Pushpak Bhattacharyya
,
Malhar Kulkarni
PDF
Cite
Poster
"Keep Your Dimensions on a Leash": True Cognate Detection using Siamese Deep Neural Networks
Automatic Cognate Detection helps NLP tasks of Machine Translation, Information Retrieval, and Phylogenetics. Cognate words are defined …
Diptesh Kanojia
,
Sravan Munukutla
,
Sayali Ghodekar
,
Pushpak Bhattacharyya
,
Malhar Kulkarni
PDF
Cite
Code
Dataset
Poster
Utilizing Word Embeddings based Features for Phylogenetic Tree Generation of Sanskrit Texts
Tracing the root of a text i.e., the original version of the text, by inferring phylogenetic trees has been a topic of interest in …
Diptesh Kanojia
,
Abhijeet Dubey
,
Malhar Kulkarni
,
Pushpak Bhattacharyya
,
Gholamreza Haffari
PDF
Cite
Slides
Utilizing Wordnets for Cognate Detection among Indian Languages
Automatic Cognate Detection (ACD) is a challenging task which has been utilized to help NLP applications like Machine Translation, …
Diptesh Kanojia
,
Kevin Patel
,
Pushpak Bhattacharyya
,
Malhar Kulkarni
,
Gholamreza Haffari
Preprint
PDF
Cite
Slides
An Introduction to the Textual History Tool
This paper describes a digital tool called the Textual History Tool in detail. This tool captures the historical evolution of a text …
Diptesh Kanojia
,
Malhar Kulkarni
,
Pushpak Bhattacharyya
,
Sayali Ghodekar
,
Irawati Kulkarni
,
Nilesh Joshi
,
Eivind Kahrs
PDF
Cite
Project
Slides
Some Strategies to Capture Karaka-Yogyata with Special Reference to apadana
In today’s digital world language technology has gained importance. Several software, have been developed and are available in the …
Swaraja Salaskar
,
Diptesh Kanojia
,
Malhar Kulkarni
Preprint
PDF
Cite
Poster
Cognate Identification to improve Phylogenetic trees for Indian Languages
Cognates are present in multiple variants of the same text across different languages. Computational Phylogenetics uses algorithms and …
Diptesh Kanojia
,
Malhar Kulkarni
,
Pushpak Bhattacharyya
,
Gholamreza Haffari
PDF
Cite
Poster
Slides
Synthesizing Audio for Hindi Wordnet
In this paper, we describe our work on the creation of a voice model using a speech synthesis system for the Hindi Language. We use …
Diptesh Kanojia
,
Preethi Jyothi
,
Pushpak Bhattacharyya
PDF
Cite
Poster
Semi-automatic WordNet Linking using Word Embeddings
Wordnets are rich lexico-semantic resources. Linked wordnets are extensions of wordnets, which link similar concepts in wordnets of …
Kevin Patel
,
Diptesh Kanojia
,
Pushpak Bhattacharyya
Preprint
PDF
Cite
Code
Dataset
Slides
pyiwn: A Python-based API to access Indian Language WordNets
Indian language WordNets have their individual web-based browsing interfaces along with a common interface for IndoWordNet. These …
Ritesh Panjwani
,
Diptesh Kanojia
,
Pushpak Bhattacharyya
PDF
Cite
Code
Poster
New Vistas to study Bhartṛhari: Cognitive NLP
A sentence is an important notion in the Indian grammatical tradition. The collection of the definitions of a sentence can be found in …
Jayashree Gajjam
,
Diptesh Kanojia
,
Malhar Kulkarni
Preprint
PDF
Cite
Slides
Indian Language Wordnets and their Linkages with Princeton WordNet
Wordnets are rich lexico-semantic resources. Linked wordnets are extensions of wordnets, which link similar concepts in wordnets of …
Diptesh Kanojia
,
Kevin Patel
,
Pushpak Bhattacharyya
Preprint
PDF
Cite
Code
Dataset
Poster
Hindi Wordnet for Language Teaching: Experiences and Lessons Learnt
This paper reports the work related to making Hindi Wordnet1 available as a digital resource for language learning and teaching, and …
Hanumant Redkar
,
Rajita Shukla
,
Sandhya Singh
,
Jaya Saraswati
,
Laxmi Kashyap
,
Diptesh Kanojia
,
Preethi Jyothi
,
Malhar Kulkarni
,
Pushpak Bhattacharyya
PDF
Cite
Slides
Eyes are the Windows to the Soul: Predicting the Rating of Text Quality Using Gaze Behaviour
Predicting a reader’s rating of text quality is a challenging task that involves estimating different subjective aspects of the …
Sandeep Mathias
,
Diptesh Kanojia
,
Kevin Patel
,
Samarth Agarwal
,
Abhijit Mishra
,
Pushpak Bhattacharyya
Preprint
PDF
Cite
Poster
Scanpath Complexity: Modeling Reading Effort using Gaze Information
Measuring reading effort is useful for practical purposes such as designing learning material and personalizing text comprehension …
Abhijit Mishra
,
Diptesh Kanojia
,
Seema Nagar
,
Kuntal Dey
,
Pushpak Bhattacharyya
PDF
Cite
Dataset
Slides
Sarcasm Suite: A browser-based engine for sarcasm detection and generation
Sarcasm Suite is a browser-based engine that deploys five of our past papers in sarcasm detection and generation. The sarcasm detection …
Aditya Joshi
,
Diptesh Kanojia
,
Pushpak Bhattacharyya
PDF
Cite
Project
Slides
Is your Statement Purposeless? Predicting Computer Science Graduation Admission Acceptance based on Statement Of Purpose
We present a quantitative, data-driven machine learning approach to mitigate the problem of unpredictability of Computer Science …
Diptesh Kanojia
,
Nikhil Wani
,
Pushpak Bhattacharyya
PDF
Cite
Slides
That’ll do fine!: A coarse lexical resource for English-Hindi MT, using polylingual topic models
Parallel corpora are often injected with bilingual lexical resources for improved Indian language machine translation (MT). In absence …
Diptesh Kanojia
,
Aaditya Joshi
,
Pushpak Bhattacharyya
,
Mark J. Carman
PDF
Cite
Poster
Sophisticated Lexical Databases - Simplified Usage: Mobile Applications and Browser Plugins For Wordnets
India is a country with 22 officially recognized languages and 17 of these have WordNets, a crucial resource. Web browser based …
Diptesh Kanojia
,
Raj Dabre
,
Pushpak Bhattarcharyya
PDF
Cite
Slides
SlangNet: A WordNet like resource for English Slang
We present a WordNet like structured resource for slang words and neologisms on the internet. The dynamism of language is often an …
Shehzaad Dhuliawala
,
Diptesh Kanojia
,
Pushpak Bhattacharyya
PDF
Cite
Slides
Predicting Readers' Sarcasm Understandability by Modeling Gaze Behavior
Sarcasm understandability or the ability to understand textual sarcasm depends upon readers’ language proficiency, social knowledge, …
Abhijit Mishra
,
Diptesh Kanojia
,
Pushpak Bhattacharyya
PDF
Cite
Dataset
Poster
Mapping it differently: A solution to the linking challenges
This paper reports the work of creating bilingual mappings in English for certain synsets of Hindi wordnet, the need for doing this, …
Meghna Singh
,
Rajita Shukla
,
Jaya Jha
,
Laxmi Kashyap
,
Diptesh Kanojia
,
Pushpak Bhattacharyya
PDF
Cite
Slides
Leveraging Cognitive Features for Sentiment Analysis
Sentiments expressed in user-generated short text and sentences are nuanced by subtleties at lexical, syntactic, semantic and pragmatic …
Abhijit Mishra
,
Diptesh Kanojia
,
Seema Nagar
,
Kuntal Dey
,
Pushpak Bhattacharyya
Preprint
PDF
Cite
Dataset
Slides
Harnessing Cognitive Features for Sarcasm Detection
In this paper, we propose a novel mechanism for enriching the feature vector, for the task of sarcasm detection, with cognitive …
Abhijit Mishra
,
Diptesh Kanojia
,
Seema Nagar
,
Kuntal Dey
,
Pushpak Bhattacharyya
Preprint
PDF
Cite
Dataset
Poster
Civique: Using Social Media to detect Urban Emergencies
We present the Civique system for emergency detection in urban areas by monitoring micro blogs like Tweets. The system detects …
Diptesh Kanojia
,
Vishwajeet Kumar
,
Krithi Ramamritham
Preprint
PDF
Cite
Poster
Slides
A picture is worth a thousand words: Using OpenClipArt library for enriching IndoWordNet
WordNet has proved to be immensely useful for Word Sense Disambiguation, and thence Machine translation, Information Retrieval and …
Diptesh Kanojia
,
Shehzaad Dhuliawala
,
Pushpak Bhattarcharyya
PDF
Cite
Slides
World WordNet database structure: an efficient schema for storing information of WordNets of the world
WordNet is an online lexical resource which expresses unique concepts in a language. English WordNet is the first WordNet which was …
Hanumant Harichandra Redkar
,
Sudha Baban Bhingardive
,
Diptesh Kanojia
,
Pushpak Bhattacharyya
PDF
Cite
Slides
Using Multilingual Topic Models for Improved Alignment in English-Hindi MT
Parallel corpora are often injected with bilingual dictionaries for improved Indian language machine translation (MT). In absence of …
Diptesh Kanojia
,
Aaditya Joshi
,
Pushpak Bhattarcharyya
,
Mark J. Carman
PDF
Cite
Slides
TransChat: Cross-Lingual Instant Messaging for Indian Languages
We present TransChat, an open-source, cross platform, Indian language Instant Messaging (IM) application that facilitates cross lingual …
Diptesh Kanojia
,
Shehzaad Dhuliawala
,
Naman Gupta
,
Abhijit Mishra
,
Pushpak Bhattarcharyya
PDF
Cite
Poster
PanchBhoota: Hierarchical phrase based machine translation systems for five Indian languages
We present our work on developing fifteen Hierarchical Phrase Based Statistical Machine Translation (HPBSMT) systems for five Indian …
Neha R Prabhugaonkar
,
Apurva S Nagvenkar
,
Diptesh Kanojia
,
Jyoti D. Pawar
,
Pushpak Bhattacharyya
,
Manish Shrivastava
PDF
Cite
Source Document
PaCMan: Parallel Corpus Management Workbench
We present a Parallel Corpora Management tool that aides parallel corpora generation for the task of Machine Translation (MT). It takes …
Diptesh Kanojia
,
Manish Shrivastava
,
Raj Dabre
,
Pushpak Bhattacharyya
PDF
Cite
Poster
Do not do processing, when you can look up: Towards a Discrimination Net for WSD
The task of Word Sense Disambiguation (WSD) incorporates in its definition the role of ‘context’. We present our work on the …
Diptesh Kanojia
,
Pushpak Bhattacharyya
,
Raj Dabre
,
Siddhartha Gunti
,
Manish Shrivastava
PDF
Cite
Slides
More than meets the eye: Study of Human Cognition in Sense Annotation
Word Sense Disambiguation (WSD) approaches have reported good accuracies in recent years. However, these approaches can be classified …
Salil Joshi
,
Diptesh Kanojia
,
Pushpak Bhattacharyya
PDF
Cite
Slides
Discrimination-net for Hindi
Current state-of-the-art Word Sense Disambiguation (WSD) algorithms are mostly supervised and use the P (Sense|Word) statistic for …
Diptesh Kanojia
,
Arindam Chatterjee
,
Salil Joshi
,
Pushpak Bhattacharyya
PDF
Cite
Video
A Study of the Sense Annotation Process: Man v/s Machine.
Does context help determine sense? This question might seem frivolous, even preposterous to anybody sensible. However, our long time …
Arindam Chatterjee
,
Salil Joshi
,
Pushpak Bhattacharyya
,
Diptesh Kanojia
,
Akhlesh Kumar Meena
PDF
Cite
Slides
Cite
×