Build voice agents through Vercel AI Gateway, but keep the beta label visible
Vercel AI Gateway now supports realtime voice, speech generation, and transcription through AI SDK 7, with beta limits and server-minted tokens to keep keys out of browsers.
Vercel AI Gateway now supports realtime voice agents, speech generation, and transcription through the AI SDK. Confidence level: confirmed beta. The useful part is not just audio support, but routing, observability, spend controls, and bring-your-own-key handling in the same gateway layer teams may already use for text, image, and video models.

What changed
Vercel says AI Gateway now supports voice and audio models for three main jobs: realtime speech-to-speech agents, text-to-speech generation, and speech-to-text transcription. The capabilities are in beta and available through AI SDK 7.
The realtime quickstart shows a server route minting a short-lived token and a browser client connecting through the AI SDK realtime hook. That matters because the browser should never receive the AI Gateway API key.
Why this is early
This is early because the changelog is dated June 29, 2026 and the realtime quickstart still points at canary AI SDK packages for the AI Gateway provider. The capability is official, but developers should treat production use as beta until the package channel, model list, pricing, and provider behavior settle.
Releasebot provides external release-feed context, while Vercel's changelog and docs carry the actual implementation details. The strongest evidence is Vercel's own docs, not community examples or product-hunt style launch posts.
Key takeaways
- AI Gateway now covers realtime voice, speech generation, and transcription.
- Vercel describes the feature as beta and available via AI SDK 7.
- Realtime browser agents should use a server-minted short-lived token.
- Voice support sits behind the same observability and spend controls as other AI Gateway model traffic.
- Model support varies, so test the exact realtime, speech, or transcription model before shipping.
| Capability | Use it for | Access path | Caveat |
|---|---|---|---|
| Realtime voice | Live speech-to-speech agents | AI Gateway realtime model plus AI SDK hook | Beta; provider behavior and latency need testing |
| Speech generation | Audio replies, voiceovers, content narration | AI SDK generateSpeech | Voice, format, and language support vary by provider |
| Transcription | Voice notes, uploaded audio, call summaries | AI Gateway and AI SDK audio flow | Accuracy, file limits, and privacy controls need validation |
| Server token route | Browser voice apps | getToken on the server | Do not expose gateway keys to clients |
Availability and access
The changelog says the feature is available through AI SDK 7, while the realtime quickstart references canary packages for the gateway provider and React bindings. That means teams should pin versions deliberately and avoid assuming every AI SDK 7 install has the same realtime behavior.
Vercel also says AI Gateway has no markup or platform fees for this lane, but you still need to check upstream model costs, gateway billing, provider availability, rate limits, and whether your chosen model supports the audio mode you need.
Practical LinkLoot angle
This is useful for teams that want one gateway for text, image, video, voice, and transcription instead of stitching separate provider SDKs into every product feature. A practical first build is a private internal voice agent: one route for short-lived tokens, one browser component, one low-risk tool, and logging around latency, disconnects, cost, and transcript quality.
For broader automation planning, pair this with LinkLoot's /guides/ai-workflow-automation guide before connecting the voice agent to actions that can change customer data or infrastructure.
What to verify before you act
- Confirm whether your app should use stable AI SDK 7 packages or the canary packages referenced by the realtime quickstart.
- Test the exact model IDs for realtime voice, speech generation, and transcription.
- Keep AI Gateway API keys server-side and mint short-lived tokens for browsers.
- Measure latency, turn detection, disconnect behavior, transcript quality, and cost under realistic audio.
- Review data retention, logging, and enterprise controls before sending sensitive conversations through any provider.
Source check
Confirmed by: Vercel's AI Gateway realtime voice changelog, Vercel's realtime quickstart, and AI SDK speech documentation.
Early signal / context: Releasebot's Vercel feed for release-tracking context. Treat model support, pricing, and package stability as items to verify in Vercel's docs and your own account before production rollout.
Yes. Vercel says AI Gateway now supports realtime voice agents, speech generation, and transcription in beta.
