Introducing Mist v3: TTS Built for Enterprise Scale

Michael Cullan & Patrick Coleman

Rime Product Team

Today we're releasing Mist v3, our fastest and most efficient text-to-speech model to date, built for the latency and throughput demands of enterprise production voice deployments.

If you've built on Mist before, the voices are the same. That's intentional. What changed is how requests are ingested, handled, and processed under the hood.

The result is approximately 40ms time-to-first-byte on an L40S or RTX 6000 (p90), while maintaining high-throughput and all of the pronunciation control that Mist is known for.

Mist v3 is incredibly fast and deterministic pronunciation is a game-changer. When you're powering live conversations at scale, you can't have a model guessing brand names or proper nouns. The combination of speed and reliability makes Rime TTS a great option for voice agents. —Tom Shapland, LiveKit

Why we built this

The Text-to-Speech (TTS) layer often becomes a bottleneck in voice AI pipelines. Many models, while performing adequately in single-request tests, were not designed for sustained high throughput, leading to degraded latency and soaring infrastructure costs. This scalability issue represents a significant limitation for high-volume operations, such as enterprise contact centers managing thousands of concurrent calls or AI native companies who are trying to keep up with hyper-scaling.

We focused on the layer that would drive the most value for our partners: a new inference engine that handles a high throughput of concurrent TTS requests. Pronunciation control is still available out of the box, with additional SSML features like controllable pauses and in-line speed adjustment.

For teams building real-time conversational voice applications, the throughput capabilities translate directly to user experience. Ali Mansoor, Founding Engineer at Trillet AI, put it plainly: "Mist v3 on a co-located endpoint has been a 3x latency improvement for us. We're consistently seeing sub-100ms TTFB in production — that's the difference between a conversation that feels human and one that doesn't."

How Mist v3 is changing the game for Attune AI

Attune, the Agentic Voice Platform for Healthcare, deploys Rime's Mist v3 on-premises to power AI-driven patient and member engagement for some of the nation's largest healthcare organizations.

Operating in a highly regulated environment where data residency, HIPAA compliance, and voice quality are non-negotiable, Attune chose to self-host Mist v3 to maintain full control over its infrastructure while delivering natural, empathetic conversations at enterprise scale.

With Mist v3 running on-prem, Attune has dramatically reduced TTS latency while handling thousands of concurrent voice interactions across its platform.

"In healthcare, the voice is the first point of trust. Patients need to feel heard, not processed — and that starts with how the AI sounds and how fast it responds. Running Mist v3 on-prem gives us the latency, cost structure, and data control that healthcare demands. It's become a core part of our infrastructure." — Jack Ryan, Chief Product Officer, Attune

Deploying Mist v3

Mist v3 is available now across cloud and on-premises deployments. Check out the docs to learn more or sign up for a free account to start building today.

Make every interaction matter

Whether you’re modernizing your IVR or building the next generation of AI TTS voice experiences, Rime ensures your brand sounds authentic, accurate, and trustworthy. Across every interaction, at scale.

Start building for free

Book a demo

Make every interaction matter

Start building for free

Book a demo

Make every interaction matter

Start building for free

Book a demo