New Delhi: Sarvam AI has launched Saaras V3, the latest version of its speech recognition model, with a strong focus on mixed-language and noisy audio environments.
The new model supports all 22 scheduled languages of India and now offers real-time streaming, allowing users to get low-latency transcriptions without losing accuracy. Saaras V3 also includes automatic language detection, word-level time-stamps, and speaker identification for multi-speaker recordings.
Drop 8/14: Introducing Saaras V3, the next iteration of our speech recognition model. We extend our lead in this space with an even more accurate model, particularly for mixed-language and noisy speech.
— Pratyush Kumar (@pratykumar) February 11, 2026
We have also expanded support for all the 22 scheduled languages of India.…
According to the company, the model is designed for use in voice bots, subtitling, and large-scale analysis of call recordings. Sarvam says the update extends its leadership in speech recognition for Indian languages.
The launch of Saaras V3 is part of a wider set of announcements made by the company in a series of social media posts.
Earlier, Sarvam revealed that it had enabled live multi-language dubbing of the Union Budget speech on Republic TV, reaching millions of homes with under two minutes of delay. This was powered by its Sarvam Dub system, which focuses on retaining speaker voice similarity while delivering fast translations.
Drop 6/14: @SarvamAI is proud to announce a landmark in India’s sovereign AI journey through strategic partnerships with the Governments of Odisha and Tamil Nadu. The aim of these partnerships is to drive transformation by building at-scale compute, sovereign models, and the… pic.twitter.com/Scx9mK6CPw
— Pratyush Kumar (@pratykumar) February 9, 2026
The company also highlighted the growth of its conversational platform, Samvaad, which now handles over one million minutes of interactions every day. These AI-powered voice agents are being used for customer service, sales, and large-scale outreach programmes.
Sarvam claimed that nearly 80 per cent of its automated calls are now difficult to distinguish from human callers. The company says this has led to higher customer engagement and better sales interest.
In another announcement, Sarvam introduced Sarvam Vision, a 3-billion-parameter vision-language model aimed at improving digitisation in Indian languages. It also launched Bulbul V3, its latest text-to-speech system, which topped a third-party human listening study for preference and accuracy.
The company has also partnered with the governments of Odisha and Tamil Nadu to build state-level AI infrastructure and deploy AI across departments. These projects aim to develop sovereign models and expand public use of artificial intelligence.
Drop 5/14: Introducing Bulbul V3, our latest text-to-speech model. It raises the bar for how human it sounds, while being super robust.
— Pratyush Kumar (@pratykumar) February 7, 2026
In an independent third-party human listening study, Bulbul V3 delivers the highest listener preference, and low error rates across use-cases… pic.twitter.com/w7HThWzuKe
Sarvam also unveiled Arya, a multi-agent orchestration platform for enterprise use. The company plans to open-source the system, which is designed to support large-scale, reliable AI workflows.
Throughout the thread, Sarvam stressed its focus on building a “sovereign AI” ecosystem for India. It argued that local data and models are key to long-term economic and technological growth.
With Saaras V3 at the centre of its latest rollout, Sarvam is positioning itself as a major player in Indian-language speech technology, targeting applications across media, government, and enterprise services.
Contact to : xlf550402@gmail.com
Copyright © boyuanhulian 2020 - 2023. All Right Reserved.