Why Vercel AI SDK can't stream to multiple devices

TL;DR Every SSE connection in Vercel AI SDK is scoped to a single HTTP request from a single browser tab. There is no mechanism in the protocol for one generation to reach more than one listener. This is how HTTP works, not a limitation of the SDK. Tab switches, device switches, and background agents all drop the stream with no reconnect path. The fix is a session that persists independently of any connection: the agent writes to a channel, any client subscribed to that channel receives the stream. The ChatTransport interface is the integration point.

When useChat starts a generation, it opens a request to your server and the response streams back to that specific tab. There's no mechanism in the protocol for one generation to reach more than one listener.

Most AI applications end up with users who open a second tab, switch to their phone, share a session with a colleague, or start a generation and walk away. In every case, the stream doesn't follow them. Bridging that gap requires building something that SSE doesn't provide.

Copy link to clipboard

Why SSE is one-to-one by design

SSE is a standard HTTP response with a specific content type and line-based formatting. The client makes a request. The server streams a response. That connection belongs to the tab that made it.

If a second tab wants the stream, it has to make its own request and get its own response. The server responds to individual clients. It doesn't broadcast. There's no pub/sub layer built into SSE, no concept of channels or subscribers, and no way for a client to join a stream already in progress.

Vercel couldn't fix this by shipping a different SDK. The constraint is in HTTP itself.

Copy link to clipboard

Where this surfaces in practice

Tab switches. A user opens the same conversation in a new tab. The stream doesn't carry over. The new tab has no connection to the generation in progress. This is the most commonly reported variant. One GitHub issue captures it directly: a developer consuming the streamText datastream server-side to replicate it to clients.

Device switches. You start a generation on your laptop and pick up your phone. Nothing has arrived. The stream is tied to the browser context on the laptop. The phone has no connection to it. One team building AI products described this plainly: "Multi-device synchronisation is an unsolved problem. ChatGPT doesn't do this."

Background agents. An agent runs for two minutes while you check something else. You navigate away. The agent completes its work and delivers it to no one. The results exist somewhere in a server process, but there's no listener to receive them.

Shared sessions. A team wants to see the same AI response live, during a review, an approval workflow, or a shared task. Each team member needs their own stream, which means either separate generations or a fan-out layer you build yourself. Neither option comes cheap.

Copy link to clipboard

What teams try first

Running the generation multiple times. The simplest approach: each client gets its own request and its own response. It works but it's expensive. Language models are non-deterministic, so two clients asking the same question get different answers. For most applications that's not acceptable.

Redis pub/sub. A common next step. The agent publishes tokens to Redis. A pool of workers reads from Redis and pushes to individual SSE clients. This solves fan-out well enough for simple cases. It breaks down when a client reconnects mid-stream. Redis pub/sub has no delivery guarantees and no history for catch-up. A client that dropped and reconnected gets the stream from the point it rejoined, not from where it left off.

Database polling. Clients poll a database for new tokens. This adds write latency per token and creates load that scales poorly. A typical generation produces tens of tokens per second, which works out to roughly 180,000 writes per hour per active session. Most databases aren't designed for that pattern.

Server-side stream replication. Consume the stream on the server and forward it to multiple SSE connections manually. It works, but it requires maintaining state about which clients are connected, handling reconnects, and managing the lifecycle of each forwarded connection. You've effectively built most of a realtime system at this point, without the reliability guarantees.

Copy link to clipboard

Multi-device sessions

The model that makes this work is a session that persists independently of any connection. The agent writes tokens to a channel. Any client subscribed to that channel receives every token in realtime, regardless of which device it's on.

Late joiners load history from where they connected, or from the beginning if they arrive after the generation finishes. Clients that disconnect and reconnect pick up from their last position, not from the live edge. Presence on the channel tells the agent whether anyone is watching, which matters for workflows that adapt to user interaction.

The ChatTransport interface in the AI SDK is the integration point for this. The transport handles the session mechanics. useChat doesn't change. The same hook that works for a single tab works across multiple tabs, devices, or users.

The same generation reaches any number of listeners, with history available for anyone who joins late or reconnects.

Copy link to clipboard

When you need this

For a single user on a single device with a stable connection, SSE works fine. A simple single-tab chatbot doesn't need anything more.

Multi-device delivery becomes relevant when users open more than one tab, move between desktop and mobile, share sessions with colleagues, or run agents in the background while doing other things. Most production AI applications hit at least one of these in their first few months.

The hardest case is background agents. Users don't think about whether they're connected to an agent while it's running. They start it and move on. Getting results to them when the agent completes, regardless of what device they're on or whether they stayed on the page, is a session layer problem that SSE cannot solve.

Copy link to clipboard

What to look for in a transport for multi-device delivery

Multi-client delivery. One agent publish should reach every connected subscriber simultaneously. Connected clients get live streaming. Clients that reconnect get automatic catch-up. This shouldn't require per-client connection management.

Offset-based reconnection. The transport should track message positions and replay only what the client missed. Without offsets, a reconnecting client receives everything from the start or nothing.

Presence. The transport should surface whether any clients are subscribed to the session. Agents running in the background need a way to know whether anyone is watching — which matters for cost control and workflows that adapt to user activity.

Protocol fallback. Enterprise networks and mobile connections don't always allow WebSocket upgrades. A transport that negotiates protocol automatically e.g. WebSocket to HTTP streaming to long-polling, reaches clients on restricted networks without extra handling.

Ably AI Transport integrates with the Vercel AI SDK to add durable sessions, multi-device sync, and bidirectional control to your chat application. Visit the Ably AI Transport overview, read the documentation, or sign up free

Ready to build? Get started with Vercel AI SDK.

Sources: AI SDK UI Transport documentation; GitHub issue #6090 ("I consume the streamText datastream server-side to replicate it to clients").

Why Vercel AI SDK can't stream to multiple devices

Why SSE is one-to-one by design

Where this surfaces in practice

What teams try first

Multi-device sessions

When you need this

What to look for in a transport for multi-device delivery

Recommended Articles

Durable sessions for Vercel AI SDK applications

Vercel AI SDK resumable-stream: what it covers and what it doesn't

Why AI chat history disappears between sessions

Join the Ably newsletter today