Technology

“India’s Own AI” — Krutrim: An India-Made Model Built From Scratch And Trained On Vast Indic Data

Karan Kamble

Dec 22, 2023, 04:16 PM | Updated Dec 23, 2023, 12:19 PM IST


Krutrim AI launched by Ola's Bhavish Aggarwal.
Krutrim AI launched by Ola's Bhavish Aggarwal.

“It’s totally hopeless to compete with us on training foundational (AI) models… it’s your job to try anyway,” Sam Altman, the chief executive officer of ChatGPT maker OpenAI, said at The Economic Times Conversations in early June 2023.

Only about six months later, Ola, best known as a ride-hailing company, has unveiled “India’s own AI.”

They call it ‘Krutrim’, Sanskrit for “artificial.”

However, there’s nothing “artificial” about Krutrim (we get the ‘artificial intelligence’ wordplay). Krutrim’s stated vision is to create India’s own AI for 1.4 billion Indians — a noble homegrown initiative, but also quite a daring one, with the sheer dominance of the sensational AI chatbot ChatGPT worldwide, but especially in India. 

“India has been a country that has truly embraced ChatGPT. There has been a lot of early adoption and real enthusiasm from the users,” Altman himself has admitted.

But if an AI can steer interest away from ChatGPT and others like Microsoft’s Bing AI and Google’s Bard AI, it would have to be an India-made AI, tailored to the Indian context.

“There are very few moments in time when a technology comes along that can impact both the economy and culture profoundly,” Ola co-founder Bhavish Aggarwal said on 15 December, introducing Krutrim to the world through a live launch event.

Stating that the various language learning AI models in existence today do not particularly incorporate India’s cultural context, Aggarwal said, “An India-first AI should be able to understand our uniqueness and right cultural context. It needs to be trained on unique data sets specific to us, and, on top of it all, it needs to be accessible to India, with India-first cost structures.”

Krutrim is the company’s first family of AI models developed here in India, for India. Starting with this AI model, the company plans to build other AI models across text, voice, and vision, and even go beyond large language models (LLMs) over time. For now, Krutrim Pro, a very large multimodal model, is in the works. It will have more sophisticated problem-solving and task-execution capabilities.

Krutrim is Ola’s base LLM.

An LLM is a type of AI model that is trained to understand and generate human language. Using its training, it performs a wide range of language-related tasks, such as language translation, text summarisation, and question-answering, among others.

“Some examples of large language models include OpenAI's GPT (Generative Pre-trained Transformer) series, such as GPT-2 and GPT-3, which have been widely used for natural language processing tasks due to their impressive capabilities in understanding and generating human-like text,” GPT-3.5 tells me, not uncomfortable about self-referencing.

Krutrim took three months to train its model’s first version. Remarkably, the model is trained on more than 2 trillion “tokens.”

A token is a fundamental unit of input or output. Tokens can represent individual words, subwords, characters, or other elements of a sequence, depending on the specific task or model architecture. Models like Krutrim understand language in tokens.

“We probably have 20 times the number of Indic tokens that any other model has,” Gautam Bhargava, who leads engineering at Ola and along with his team has built all the apps for the company, said on launch day.

“This LLM, by far, has the largest representation of Indian data used in training ever,” said Ravi Jain, the chief marketing and revenue officer, at Ola. As a result, the model has a truly Indic persona and responds accordingly.

It can understand 22 Indian languages and can generate content in about 10, including Marathi, Bengali, Tamil, Kannada, Telugu, Malayalam, Odia, and Gujarati. The AI has the ability to not only understand multiple Indian languages, but it can also make sense of mixed languages like Hinglish (Hindi and English combined).

Said to be capable of powering most day-to-day applications, the use cases proposed for the model range from everyday tasks like reserving a table at a restaurant to managing large customer support for businesses.

As part of the live demonstrations of their model, seeking to exhibit the model’s understanding of the Indian context, the Krutrim AI chatbot was able to generate a poem in Tamil welcoming guests to their startup event and another poem in Bengali describing the beauty of monsoons.

Additionally, it was commanded to write a code, and did so, in the popular programming language C++ to do bubble sort. This simple sorting algorithm works by repeatedly stepping through the list to be sorted, comparing each pair of adjacent items, and swapping them if they are in the wrong order until the list is sorted.

Notably, the model is voice-enabled, enabling the user to interact with it by speech, though this was not part of the demonstrations.

Krutrim will be released to the public over this month and the next. By the end of January 2024, everyone will have access to Krutrim, Aggarwal said. People who wish to try the AI have been invited to join the waitlist on the Krutrim website. The company is providing early access in batches from 15 December.

The more advanced Krutrim Pro will be launched next quarter. Developer APIs will go live in February 2024. (APIs, or application programming interfaces, enable applications to exchange data and functionality easily and securely.)

The AI teams behind Krutrim work out of Bengaluru, Karnataka (India), and the San Francisco Bay Area, California (the United States). They and others in Ola have already adopted Krutrim to assist them with work.

“All of Ola group companies are already using Krutrim for a lot of their internal workloads, be it customer support, voice and chat, customer sales calls,... and for other processes,” Aggarwal told his live audience.

Krutrim's live demos
Krutrim's live demos

How It Compares To GPT-4

Chandra Khatri, who leads the AI efforts at Krutrim, claimed during the launch that their model outperformed GPT-4 over a range of Indic languages. They said they arrived at this conclusion by having human language experts evaluate and compare responses to thousands of prompts and questions fed into Krutrim and GPT, as well as many other models, across various Indic languages.

When it came to the results in English, especially Indic English, Krutrim fell short of GPT-4 and Google’s Bard or Gemini, but not by much, and did better than the well-known, open-source model by the company Meta, called Llama 2 Chat, a model equivalent in size to Krutrim.

Across their experiments, the model was put through a variety of tasks across aspects like reasoning, mathematics, and coding as part of the evaluation.

Impressively, Krutrim is not the result of simply fine-tuning another model. “We had a whole vision of doing foundational models for India using Indic languages,” Bhargava said.

Only about 15 technology players of the over 10,000 around the world are working on developing foundational models, and Krutrim has now joined that small club, having built their foundational model from scratch.

“This is not just a wrapper on some existing API. This is not just a little bit of fine-tuning done… taking an existing model and putting a little bit more of a data set into it. This is deep foundational work, starting from the science layer, changing the math and the algorithms of the models to make it more relevant for Indian languages, putting in the right mixture of data, and generating this outcome,” said Aggarwal.

India’s First AI Supercomputer?

The Krutrim AI largely comprises three critical elements — applied AI and engineering (the AI models), infrastructure, and silicon software and hardware, all stitched together.

The silicon software and hardware aspects form the foundation for the resulting AI models and their applications.

Sambit Sahu, who looks after hardware design at Krutrim, said they came up with a novel architecture integrating multiple chiplets, wherein a chiplet is a small piece of silicon executing a certain functionality, such as a CPU chiplet or an AI chiplet. All the chiplets then take their place in what’s called a “package.” Krutrim plans to develop a package prototype in a few months.

“The architecture is ready and we are now marching on to implementation,” said Sahu, who, according to Aggarwal, has made the maximum number of chips in India.

He added: “Not only (have) we come up with an SoP, we also want to scale this SoP to build clusters and to ultimately build supercomputers. We are coming up with novel architecture to take this SoP… to ultimately scale up to build India’s first AI supercomputer.”

As for the other element, infrastructure, which is the data centre or “cloud” and effectively powers the AI, Krutrim has developed technology to bring down the energy cost of data centres, which otherwise tend to be fairly energy-guzzling. Krutrim uses a liquid-cooling heat exchange mechanism that cuts down on energy waste.

“Data centres in India have a PUE of 1.5, which means about 50 per cent of the energy is wasted on top of 1 unit which is used for useful compute. Now, this technology has a PUE of 1.1; that means only 10 per cent of the energy is wasted,” Aggarwal said, adding that they are prototyping this technology and are in the advanced stages of deployment.

Krutrim has a large vision for India’s future as an AI-first economy. “The vision is not a Krutrim or an Ola vision,” according to Aggarwal. “This is what India needs. If you look at the penetration of computing in India, it’s a fraction of what China and the West has. And that’s unfair. And it’s unfair because I feel, to their own self-harm, global computing companies have not really gone deep into India…”

“For India to be an AI-first economy, we need to build the whole stack at the Indian performance levels, with the Indian cultural relevance, and the Indian cost structure,” he added.

What About Bhashini?

On launch day, Aggarwal was asked about how Krutrim compared to the AI-based language translation work underway through the Indian government initiative, Bhashini.

Although the Ola co-founder and chief executive did not directly address the comparison, he said, “To build really useful large models, there is a lot of engineering and data ops (operations) and architectural work that is required to really compete with the best in the world. So, some of these efforts need a strong foundation, effort that we have laid in. And then community can build on top of it.”

He added that they plan to “leverage the power of the Indian academic community, the Indian research community, startup community to really build on top of what we have created.”

For more than two years now, India’s researchers have been putting their heads together in an effort, coordinated by the Government of India, aimed at developing AI models trained in Indian languages. The initiative is called Mission Bhashini, short for BHASHa INterface for India.

Bhashini aims to enable easy access to the internet and digital services for all Indians in their language and increase the availability of online content in Indian languages. At its core, the means to accomplish this aim is simply language translation through technology, particularly AI technology.

For this purpose, Bhashini has created an ecosystem, pooled data and models contributed by the ecosystem into a shared repository, and encouraged the development of products and services in Indian languages by drawing from the open repository. This is an ongoing process.

The Bhashini ecosystem comprises government, academia, research groups, startups, industry, and even citizens, who are natural repositories of languages in India.

Within the ecosystem, the work currently underway involves building up abundant language data that can be used by researchers to develop AI language models, based on which the industry and government will build innovative products and services for citizens.

“GoI’s effort & approach is that of a Market Maker and it is great to see multiple efforts in this direction especially by the sunrise sector. Language Tech & AI is truly Democratised. The next step is drive use cases,” Amitabh Nag, the chief executive officer of Digital India Bhashini, said in an X post in the context of the Krutrim development and especially its comparison with Bhashini.

Bhashini’s goal of accomplishing translation from one Indian language to another — among text, speech, and video — carries profound implications for the country.

Thanks to Bhashini, a speech delivered by Prime Minister Narendra Modi in Hindi was translated into Tamil in real-time earlier this week.

"This is a first for me. Typically, I communicate in Hindi and AI will be responsible for translating it into Tamil," the Prime Minister said, addressing the crowd at the Kashi Tamil Sangamam.

"This is a new beginning and, hopefully, it makes it easier for me to reach you," he added.

In July, Prime Minister Modi even spoke about sharing Bhashini within the Shanghai Cooperation Organisation (SCO) — an intergovernmental organisation comprising eight member states that speak different languages.

While addressing the SCO Summit 2023 in Hindi, he said, "We would be delighted to share India's AI-based language platform Bhashini with everyone to remove language barriers within SCO. It can become an example of digital technology and inclusive growth.”

Also Read: How India Is Using AI To Build The Internet For Local Languages

Karan Kamble writes on science and technology. He occasionally wears the hat of a video anchor for Swarajya's online video programmes.


Get Swarajya in your inbox.


Magazine


image
States