dialects

Natural Language Processing for Dialects of a Language: A Survey

State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets. This survey delves into an important attribute of these datasets: the dialect of a …

Connecting Ideas in Lower-Resource Scenarios: NLP for National Varieties, Creoles, and Other Low-resource Languages

Despite excellent results on benchmarks over a small subset of languages, large language models struggle to process text from languages situated in 'lower-resource' scenarios such as dialects/sociolects (national or social varieties of a language), …

Sampling Strategies for Creation of a Benchmark for Dialectal Sentiment Classification

This paper investigates data sampling strategies to create a benchmark for dialectal sentiment classification of Google Places reviews written in English. Based on location-based filtering, we collect a self-supervised dataset of reviews in …

Connecting Ideas in 'Lower-Resource' Scenarios: NLP for National Varieties, Creoles and Other Low-resource Scenarios

Despite excellent results on benchmarks over a small subset of languages, large language models struggle to process text from languages situated in 'lower-resource' scenarios such as dialects/sociolects (national or social varieties of a language), …