Sarvam AI Beats Google, ChatGPT in Indian Language OCR & Speech Tests

Bengaluru-based Sarvam AI has announced its vision and speech models have outperformed global leaders like Google Gemini and ChatGPT on key benchmarks for optical character recognition and text-to-speech in Indian languages. The company's Sarvam Vision model achieved state-of-the-art accuracy on English OCR benchmarks and supports all 22 scheduled Indian languages. Its Bulbul V3 text-to-speech model offers 35 voices across these languages, capable of handling varied content quality. Union IT Minister Ashwini Vaishnaw highlighted the startup's work as a reflection of the success of India's AI mission.

Key Points: Sarvam AI Outperforms Google, ChatGPT in Indian Language AI

  • Beats Gemini & ChatGPT on OCR benchmarks
  • Supports all 22 scheduled Indian languages
  • Excels at complex table & chart parsing
  • Aims to make AI widely accessible in India
2 min read

India's Sarvam AI outperforms global peers in OCR, speech models

Bengaluru startup Sarvam AI claims its vision and speech models beat global giants on OCR and text-to-speech benchmarks for India's 22 scheduled languages.

"On Indian languages, Sarvam Vision is the best model by far - Pratyush Kumar"

Mumbai, Feb 9

Bengaluru‑based startup Sarvam AI has claimed that its latest vision and speech models have outperformed larger global rivals Google Gemini and ChatGPT on key optical character recognition and text‑to‑speech benchmarks for Indian languages.

In a post on X, Sarvam AI's co‑founder Pratyush Kumar said, "Sarvam Vision achieves state-of-the-art accuracy of 84.3 per cent on the olmOCR-Bench (English only subset) outperforming frontier models like Gemini 3 Pro and recent OCR models like DeepSeek OCR 2."

On OmniDocBench v1.5 (English only subset), Sarvam Vision achieved 93.28 per cent overall score, excelling in complex formulas and layout parsing and being within touching distance of the current state of the art, Kumar added.

Kumar also said the company's Bulbul V3 text‑to‑speech model supports 35 voices across all 22 scheduled Indian languages and can handle different quality scans and content.

"On Indian languages, Sarvam Vision is the best model by far, while supporting all 22 scheduled Indian languages," he claimed.

The Vision series includes a 3‑billion‑parameter state‑space model capable of image captioning, scene text recognition, chart interpretation and complex table parsing.

Sarvam AI said its focus is on making artificial intelligence widely accessible to everyone in India. "We want India to embrace the most important technological shift of our time with confidence and control. Our ambition is to build foundational components and apply them to the country's unique needs," the AI company said.

Kumar cited several examples on social media where the platform accurately extracted technical jargon from complex tables with merged rows and columns. Further, it showed Sarvam AI extracting data out of a chart from the latest Economic Survey.

Beyond documents, his posts showed Sarvam Vision demonstrating general natural scene understanding where it understood a photo of beautiful scenery and accurately described it.

Union IT minister Ashwini Vaishnaw said in a recent post on X that the startup's work reflected the success of India's AI mission.

- IANS

Share this article:

Reader Comments

P
Priya S
As someone who works with old land records in Tamil, the OCR performance on complex tables is what excites me most. If this can digitize our archives accurately, it will save years of manual work. Hope the government adopts this widely.
M
Michael C
Impressive benchmarks. Beating giants like Google on specific tasks shows the value of focused, local expertise. The text-to-speech for 35 voices is also noteworthy. Curious about the pricing model for wider accessibility.
R
Rohit P
Great to see Indian startups leading! But a word of caution - we need to see real-world deployment and not just benchmarks. Many models work well in labs but fail with the poor quality scans we often get in government offices. Hope Sarvam has tested for that.
S
Shreya B
My grandmother only speaks Marathi. If this Bulbul model can read her news articles or messages aloud in a natural voice, it would mean the world to her. Technology that includes our elders is true progress. 👵💖
K
Karthik V
The minister's support is key. This is exactly what the "Make AI in India" vision should be about - solving our unique problems, from parsing economic surveys to understanding village documents. More power to the team!

We welcome thoughtful discussions from our readers. Please keep comments respectful and on-topic.

Leave a Comment

Minimum 50 characters 0/50