Build voice agents through Vercel AI Gateway, but keep the beta label visible

Official Vercel changelog image for AI Gateway realtime voice support.Vercel
Official Vercel changelog image for AI Gateway realtime voice support.Vercel
AI & Automation

Vercel AI Gateway now supports realtime voice, speech generation, and transcription through AI SDK 7, with beta limits and server-minted tokens to keep keys out of browsers.

Vercel AI Gateway now supports realtime voice agents, speech generation, and transcription through the AI SDK. Confidence level: confirmed beta. The useful part is not just audio support, but routing, observability, spend controls, and bring-your-own-key handling in the same gateway layer teams may already use for text, image, and video models.

Vercel AI Gateway realtime voice changelog image
Vercel AI Gateway realtime voice changelog image
Official Vercel changelog image for AI Gateway realtime voice support.

What changed

Vercel says AI Gateway now supports voice and audio models for three main jobs: realtime speech-to-speech agents, text-to-speech generation, and speech-to-text transcription. The capabilities are in beta and available through AI SDK 7.

The realtime quickstart shows a server route minting a short-lived token and a browser client connecting through the AI SDK realtime hook. That matters because the browser should never receive the AI Gateway API key.

Why this is early

This is early because the changelog is dated June 29, 2026 and the realtime quickstart still points at canary AI SDK packages for the AI Gateway provider. The capability is official, but developers should treat production use as beta until the package channel, model list, pricing, and provider behavior settle.

Releasebot provides external release-feed context, while Vercel's changelog and docs carry the actual implementation details. The strongest evidence is Vercel's own docs, not community examples or product-hunt style launch posts.

Key takeaways

  • AI Gateway now covers realtime voice, speech generation, and transcription.
  • Vercel describes the feature as beta and available via AI SDK 7.
  • Realtime browser agents should use a server-minted short-lived token.
  • Voice support sits behind the same observability and spend controls as other AI Gateway model traffic.
  • Model support varies, so test the exact realtime, speech, or transcription model before shipping.
CapabilityUse it forAccess pathCaveat
Realtime voiceLive speech-to-speech agentsAI Gateway realtime model plus AI SDK hookBeta; provider behavior and latency need testing
Speech generationAudio replies, voiceovers, content narrationAI SDK generateSpeechVoice, format, and language support vary by provider
TranscriptionVoice notes, uploaded audio, call summariesAI Gateway and AI SDK audio flowAccuracy, file limits, and privacy controls need validation
Server token routeBrowser voice appsgetToken on the serverDo not expose gateway keys to clients

Availability and access

The changelog says the feature is available through AI SDK 7, while the realtime quickstart references canary packages for the gateway provider and React bindings. That means teams should pin versions deliberately and avoid assuming every AI SDK 7 install has the same realtime behavior.

Vercel also says AI Gateway has no markup or platform fees for this lane, but you still need to check upstream model costs, gateway billing, provider availability, rate limits, and whether your chosen model supports the audio mode you need.

Practical LinkLoot angle

This is useful for teams that want one gateway for text, image, video, voice, and transcription instead of stitching separate provider SDKs into every product feature. A practical first build is a private internal voice agent: one route for short-lived tokens, one browser component, one low-risk tool, and logging around latency, disconnects, cost, and transcript quality.

For broader automation planning, pair this with LinkLoot's /guides/ai-workflow-automation guide before connecting the voice agent to actions that can change customer data or infrastructure.

What to verify before you act

  • Confirm whether your app should use stable AI SDK 7 packages or the canary packages referenced by the realtime quickstart.
  • Test the exact model IDs for realtime voice, speech generation, and transcription.
  • Keep AI Gateway API keys server-side and mint short-lived tokens for browsers.
  • Measure latency, turn detection, disconnect behavior, transcript quality, and cost under realistic audio.
  • Review data retention, logging, and enterprise controls before sending sensitive conversations through any provider.

Source check

Confirmed by: Vercel's AI Gateway realtime voice changelog, Vercel's realtime quickstart, and AI SDK speech documentation.

Early signal / context: Releasebot's Vercel feed for release-tracking context. Treat model support, pricing, and package stability as items to verify in Vercel's docs and your own account before production rollout.

FAQ

Yes. Vercel says AI Gateway now supports realtime voice agents, speech generation, and transcription in beta.