Contributing to the Bhashini Initiative

About Bhashini

Bhashini aims to build a National Public Digital Platform for languages to develop services and products for citizens by leveraging the power of artificial intelligence and other emerging technologies. The project aims to create an ecosystem of academia research groups, industry collaborations, and crowd sourcing engines to facilitate research on and build scalable deep language models for indic languages.

ULCA Models

ULCA (Universal Language Contribution APIs) is a standard API and open scalable data platform (supporting various types of datasets) for Indian language datasets and models. It consists of the following classes of models:

Speech-to-speech (STS)
Translation
Automatic Speech Recognition (ASR)
Text-to-speech (TTS)
Optical Character Recognition (OCR)
Transliteration
Language Detection

Thoughtworks Industry Collaboration

I got associated with Bhashini through people at Thoughtworks who were contributing to this initiative. Thoughtworks are contributing to the ASR (Vakyansh) models at ULCA.

I am contributing to their HuggingFace ASR models and improve them to be domain adaptive (through hotwords detection) and enhancing the functionality to support inferencing on longer audio clips.

Read more on Bhashini here: https://bhashini.gov.in/en/

GitHub ULCA: https://github.com/ULCA-IN/ulca

GitHub Vakyansh: https://github.com/Open-Speech-EkStep