About Bhashini
Bhashini aims to build a National Public Digital Platform for languages to develop services and products for citizens by leveraging the power of artificial intelligence and other emerging technologies. The project aims to create an ecosystem of academia research groups, industry collaborations, and crowd sourcing engines to facilitate research on and build scalable deep language models for indic languages.
ULCA Models
ULCA (Universal Language Contribution APIs) is a standard API and open scalable data platform (supporting various types of datasets) for Indian language datasets and models. It consists of the following classes of models:
- Speech-to-speech (STS)
- Translation
- Automatic Speech Recognition (ASR)
- Text-to-speech (TTS)
- Optical Character Recognition (OCR)
- Transliteration
- Language Detection
Thoughtworks Industry Collaboration
I got associated with Bhashini through people at Thoughtworks who were contributing to this initiative. Thoughtworks are contributing to the ASR (Vakyansh) models at ULCA.
I am contributing to their HuggingFace ASR models and improve them to be domain adaptive (through hotwords detection) and enhancing the functionality to support inferencing on longer audio clips.
Read more on Bhashini here: https://bhashini.gov.in/en/
GitHub ULCA: https://github.com/ULCA-IN/ulca
GitHub Vakyansh: https://github.com/Open-Speech-EkStep