Published on 07:31 PM, April 03, 2024

Bengali.AI and IUTCS partner for regional dialect datathon

Previously, Bengali.AI collaborated with Google in 2022 to develop a standardised Bangla model.

Bengali.AI, a leading community dedicated to research and innovation in the Bengali language, has announced a Datathon in collaboration with Islamic University of Technology Computer Society (IUTCS) focusing on the history and evolution of Bengali speech recognition, with a special emphasis on regional dialects. 

The primary objective of the Datathon is to devise a system for transcribing Bengali speech across various regional dialects. Bengali.AI will provide the speech corpus for the competition, comprising spontaneous speech from 373 individuals across ten geographical locations, including Rangpur, Kishoreganj, Narail, Chattogram, and others. 

With a cumulative length of 80 hours, this speech corpus offers a unique opportunity to enhance Bengali speech recognition technology within the realm of regional speech domains. Moreover, submissions to this Datathon will contribute to the development of open-source speech recognition methods for Bengali.

The online round of the competition will run from April 1st to April 24th, 2024, with the final round scheduled for April 27th, 2024. The competition will be hosted on Kaggle, an online data science platform. Participants can join individually or form teams of up to three members. Both undergraduate and graduate students, as well as working professionals, are eligible to participate. International teams are permitted, provided at least one member is Bangladeshi. Team formations across universities are also allowed. 

Bengali.AI is a voluntary initiative driving advancements in Bengali linguistics. They have previously collaborated with Google in 2022 to develop a standardised Bangla model. Currently, Bengali.AI is working to capture the nuances of regional Bengali dialects, a task requiring extensive data and cooperation. 

They have made significant advancement by gathering data samples on regional dialects from 27,000 individuals across diverse regions. With approximately 100 hours of data for a local dialect model, they are working for the development of machine learning models capable of accurately predicting regional dialects in Bengali.