TL;DR Every SSE connection in Vercel AI SDK is scoped to a single HTTP request from a single browser tab. There is no mechanism in the protocol for one generation to reach more than one listener. This is how HTTP works, not a limitation of the SDK. Tab switches, device switches, and background agents all drop the stream with no reconnect path. The fix is a session that persists independently of any connection: the agent writes to a channel, any client subscribed to that channel receives the stream. The ChatTransport interface is the integration point.
When useChat starts a generation, it opens a request to your server and the response streams back to that specific tab. There's no mechanism in the protocol for one generation to reach more than one listener.
Most AI applications end up with users who open a second tab, switch to their phone, share a session with a colleague, or start a generation and walk away. In every case, the stream doesn't follow them. Bridging that gap requires building something that SSE doesn't provide.
Why SSE is one-to-one by design
SSE is a standard HTTP response with a specific content type and line-based formatting. The client makes a request. The server streams a response. That connection belongs to the tab that made it.
If a second tab wants the stream, it has to make its own request and get its own response. The server responds to individual clients. It doesn't broadcast. There's no pub/sub layer built into SSE, no concept of channels or subscribers, and no way for a client to join a stream already in progress.
Vercel couldn't fix this by shipping a different SDK. The constraint is in HTTP itself.
Where this surfaces in practice
Tab switches. A user opens the same conversation in a new tab. The stream doesn't carry over. The new tab has no connection to the generation in progress. This is the most commonly reported variant. One GitHub issue captures it directly: a developer consuming the streamText datastream server-side to replicate it to clients.
Device switches. You start a generation on your laptop and pick up your phone. Nothing has arrived. The stream is tied to the browser context on the laptop. The phone has no connection to it. One team building AI products described this plainly: "Multi-device synchronisation is an unsolved problem. ChatGPT doesn't do this."
Background agents. An agent runs for two minutes while you check something else. You navigate away. The agent completes its work and delivers it to no one. The results exist somewhere in a server process, but there's no listener to receive them.
Shared sessions. A team wants to see the same AI response live, during a review, an approval workflow, or a shared task. Each team member needs their own stream, which means either separate generations or a fan-out layer you build yourself. Neither option comes cheap.
What teams try first
Running the generation multiple times. The simplest approach: each client gets its own request and its own response. It works but it's expensive. Language models are non-deterministic, so two clients asking the same question get different answers. For most applications that's not acceptable.
Redis pub/sub. A common next step. The agent publishes tokens to Redis. A pool of workers reads from Redis and pushes to individual SSE clients. This solves fan-out well enough for simple cases. It breaks down when a client reconnects mid-stream. Redis pub/sub has no delivery guarantees and no history for catch-up. A client that dropped and reconnected gets the stream from the point it rejoined, not from where it left off.
Database polling. Clients poll a database for new tokens. This adds write latency per token and creates load that scales poorly. A typical generation produces tens of tokens per second, which works out to roughly 180,000 writes per hour per active session. Most databases aren't designed for that pattern.
Server-side stream replication. Consume the stream on the server and forward it to multiple SSE connections manually. It works, but it requires maintaining state about which clients are connected, handling reconnects, and managing the lifecycle of each forwarded connection. You've effectively built most of a realtime system at this point, without the reliability guarantees.
Multi-device sessions
The model that makes this work is a session that persists independently of any connection. The agent writes tokens to a channel. Any client subscribed to that channel receives every token in realtime, regardless of which device it's on.
Late joiners load history from where they connected, or from the beginning if they arrive after the generation finishes. Clients that disconnect and reconnect pick up from their last position, not from the live edge. Presence on the channel tells the agent whether anyone is watching, which matters for workflows that adapt to user interaction.
The ChatTransport interface in the AI SDK is the integration point for this. The transport handles the session mechanics. useChat doesn't change. The same hook that works for a single tab works across multiple tabs, devices, or users.
The same generation reaches any number of listeners, with history available for anyone who joins late or reconnects.
When you need this
For a single user on a single device with a stable connection, SSE works fine. A simple single-tab chatbot doesn't need anything more.
Multi-device delivery becomes relevant when users open more than one tab, move between desktop and mobile, share sessions with colleagues, or run agents in the background while doing other things. Most production AI applications hit at least one of these in their first few months.
The hardest case is background agents. Users don't think about whether they're connected to an agent while it's running. They start it and move on. Getting results to them when the agent completes, regardless of what device they're on or whether they stayed on the page, is a session layer problem that SSE cannot solve.
What to look for in a transport for multi-device delivery
Multi-client delivery. One agent publish should reach every connected subscriber simultaneously. Connected clients get live streaming. Clients that reconnect get automatic catch-up. This shouldn't require per-client connection management.
Offset-based reconnection. The transport should track message positions and replay only what the client missed. Without offsets, a reconnecting client receives everything from the start or nothing.
Presence. The transport should surface whether any clients are subscribed to the session. Agents running in the background need a way to know whether anyone is watching — which matters for cost control and workflows that adapt to user activity.
Protocol fallback. Enterprise networks and mobile connections don't always allow WebSocket upgrades. A transport that negotiates protocol automatically e.g. WebSocket to HTTP streaming to long-polling, reaches clients on restricted networks without extra handling.
Ably AI Transport integrates with the Vercel AI SDK to add durable sessions, multi-device sync, and bidirectional control to your chat application. Visit the Ably AI Transport overview, read the documentation, or sign up free
Ready to build? Get started with Vercel AI SDK.
Sources: AI SDK UI Transport documentation; GitHub issue #6090 ("I consume the streamText datastream server-side to replicate it to clients").
Recommended Articles
Durable sessions for Vercel AI SDK applications
Vercel AI SDK's SSE transport breaks in production: proxy buffering, no reconnect, serverless limits. ChatTransport makes it swappable. Options compared.
Vercel AI SDK ChatTransport: implementing a custom WebSocket transport
ChatTransport in Vercel AI SDK 5 lets you replace the default HTTP transport with WebSockets. Application code, agents, and UI stay unchanged.
Vercel AI SDK resumable-stream: what it covers and what it doesn't
Vercel's resumable-stream covers page reloads only. Tab switches, mobile backgrounding, and device switches lose the stream. Also incompatible with stop().