News Brief
Google headquarters (Representative Image)
Google DeepMind's India unit is developing an Indic language AI project called Morni (Multimodal Representation for India), with the aim of encompassing 125 Indian languages and dialects to build inclusive and equitable Indic AI.
"So India has 22 scheduled languages, which are viewed as official languages. But in our work, we are targeting over 100 Indian languages, because we find that there are 60 Indian languages which have over a billion speakers and over 125 languages that have over a lakh speakers each," said Manish Gupta, Director of Google DeepMind, Google India, while speaking at the Global Fintech Fest in Mumbai on Thursday (29 August), Economic Times reported.
Gupta further elaborated that 73 of these 125 languages lack any existing digital data.
He pointed out that even though Hindi is spoken by nearly 10 per cent of the global population, it constitutes only 0.1 per cent of the text available online.
Gupta said that the first phase of the project has successfully created an open-source database comprising over 14,000 hours of speech data across 58 languages, gathered from 80,000 speakers in 80 districts.
Announced in December 2022, Project Vaani seeks to collect and transcribe 154,000 hours of open-source anonymised speech data from all 773 districts of India.
Gupta added that they are currently progressing through phase two, which will encompass all states, covering 160 districts.