Cohere, an enterprise AI company, released its first voice model on Thursday, launching Transcribe as an open-source automatic speech recognition system designed for note-taking and speech analysis tasks. The announcement marks a significant move in the competitive transcription market: a purposefully small model built for efficiency rather than scale.
Relatively light at just 2 billion parameters, the model is designed for use with consumer-grade GPUs for those who want to self-host it. This design choice has real implications. Smaller models mean lower electricity bills for companies running transcription at scale. They mean businesses uncomfortable sending audio to cloud providers can deploy locally. They mean researchers and smaller firms get access without subscription fees.
Cohere says Transcribe beats models such as Zoom Scribe v1, IBM Granite 4.0 1B, ElevenLabs Scribe v2, and Qwen3-ASR-1.7B Speech on the Hugging Face Open ASR leaderboard, achieving an average word error rate of 5.42%, lower than any other model on the benchmark. The company claims Transcribe had an average win rate of 61% over other models when human evaluators assessed its transcriptions for accuracy, coherence, and usability. Those are substantial claims. Human evaluation matters more than benchmark metrics for real-world work, so the 61% preference rate suggests the model handles the messy reality of actual speech reasonably well.
Cohere says Transcribe can process 525 minutes of audio in a minute, which is high for its class of model. For enterprises running large volumes of meetings, customer calls, or broadcast content, speed directly affects cost.
The caveat
However, the model fell behind its rivals when it had to transcribe Portuguese, German, and Spanish. Cohere made a deliberate trade-off: depth in fewer languages rather than shallow coverage across many. The model currently supports 14 languages: English, French, German, Italian, Spanish, Portuguese, Greek, Dutch, Polish, Chinese, Japanese, Korean, Vietnamese, and Arabic. The choice reflects pragmatism. More languages mean diluted training data and weaker performance overall. By focusing on 14, Cohere optimised for quality in languages with significant commercial demand.
The company is planning to integrate Transcribe into its enterprise agent orchestration platform, North, and is making the model available through its API for free. The model will also be available on Model Vault, Cohere's managed inference platform. Offering free API access is a classic open-source strategy: attract users on the free tier, monetise through infrastructure and premium deployment services.
Speech recognition models are growing increasingly popular as demand grows for note-taking and dictation apps like Granola and Wispr Flow. The timing is sound. Enterprises want transcription baked into workflows without vendor lock-in. Smaller app builders want accuracy without monthly API bills.
Cohere's approach inverts the logic that dominated AI development for years. Bigger was presumed better. More parameters meant more capability. Transcribe succeeds by being disciplined about scope, focused on deployment practicality, and transparent about trade-offs. In a market where OpenAI's Whisper and proprietary services from ElevenLabs and Deepgram dominate, an honest, lean, open-source alternative matters. Whether it persuades enterprises already locked into established vendors remains another question.