AI Transport

Is WebSockets enough for AI chat?

WebSockets get the transport right. But session state is a different problem - and most AI chat deployments find this out the hard way.

Is WebSockets enough for AI chat?

WebSockets are the right protocol for production AI chat. But that fact doesn’t prevent the failure most teams hit first. An enterprise load balancer closes the idle connection at 60 seconds during a tool execution wait. Your reconnect logic fires in under a second, the agent keeps running server-side, and the client receives nothing from the gap. No tokens, no tool call results, no context.

The reconnected socket has no view of what happened while it was down. Three conditions cause this routinely: a proxy timeout mid-task, a page reload mid-generation, and a mobile network handoff. Each breaks for the same underlying reason: the WebSocket protocol handles transport, not session state, and reconnection logic doesn’t change that.

Key takeaways

  • WebSockets are the right protocol for production AI chat: bidirectional, persistent, and suited to live steering and tool calls in ways SSE isn’t.
  • A WebSocket connection is stateless at the session level. When it closes through a proxy timeout, page reload, or device switch, all state disappears with it.
  • Reconnection logic re-establishes the transport. It does not recover the tokens, tool calls, or agent context in flight when the connection is dropped.
  • What fills the gap is a session layer: infrastructure that persists conversation state against a session ID and replays it to reconnecting clients.

What WebSockets get right for AI chat

The protocol question is worth settling early, because the rest of this piece argues about the infrastructure layer above it. For production AI chat, the choice is WebSockets or SSE. Both stream tokens to the client, but only WebSockets let signals flow the other way.

WebSockets are bidirectional. When your user cancels mid-stream, that signal travels back on the same channel; tool call confirmations and workflow approvals work the same way. When a workflow pauses for human input mid-execution, that input must arrive in-band, not via a polling endpoint.

SSE is a one-way stream. For simple chatbots on stable networks, that doesn’t matter. Add tool calls, mid-stream cancellation, or multi-device continuity, and it does.

Where production AI connections actually fail

Not all connection drops come from bad network conditions. The more common causes in production are infrastructure defaults designed for HTTP requests, not AI chat. A response can be mid-generation for tens of seconds, and most defaults weren’t built for that.

  • AWS Application Load Balancer idle timeout. AWS ALB closes connections idle for 60 seconds by default, per AWS’s Application Load Balancer documentation. For standard HTTP that’s generous. For an agent waiting on a downstream API, 60 seconds of silence is routine, and the connection closes without warning. Your user’s response stops mid-sentence with no explanation.
  • Cloudflare proxy timeout. On Cloudflare Free and Pro plans, WebSocket connections terminate after 100 seconds of inactivity, as documented in Cloudflare’s WebSocket troubleshooting guide. Enterprise plans can raise this limit; on Free and Pro plans, the ceiling is fixed.
  • Mobile network handoffs. Switching from WiFi to cellular drops the underlying TCP connection immediately, taking the WebSocket with it. On mobile this happens during normal use: walking between coverage areas, backgrounding the tab, entering a building.
  • Page reload and tab crash. Your user reloads mid-generation, or the browser crashes, both of which are routine. The connection closes, and any session state tied to it is gone unless something stored it.

Why reconnection logic doesn’t fix the session problem

The standard reconnection pattern re-establishes the socket. Transport recovers in milliseconds. But it cannot restore the state that was in flight when the connection dropped.

Token stream position. The response kept generating while the connection was dark. Those tokens went nowhere. When the client reconnects, it arrives mid-sentence or finds nothing at all.

Tool call results. Some chat responses depend on realtime data: a lookup, a search, or an action your user triggered. If the connection dropped while the agent was waiting for that result, the response either never came - or ended before it could use the information.

Agent context. In a multi-turn exchange, the agent accumulates context: what was asked, what was answered, and what’s in progress. When a session drops and reconnects without state recovery, the agent and the client are at different points in the same conversation. Your users experience this as a loss of thread: a response that ignores what came before, or one that repeats something already answered.

The pattern most teams reach for is a Redis buffer: sequence number tracking, offset storage, and deduplication keys between the agent and the client. It handles full-page reloads. It tends to break on deploy-triggered reconnects, mobile handoffs that hit the reconnect window twice, and anything that generates messages faster than the buffer drains.

Even Vercel's AI SDK lead built a pluggable interface to fill this gap. Every team reaching this point builds the same infrastructure from scratch and chooses to own it indefinitely. Reconnection handles the protocol layer; session state sits one layer above it, and it's a separate problem entirely.

What production AI chat needs from the transport layer

Any viable approach to production AI sessions needs to satisfy four requirements. These are implementation-neutral: what any infrastructure option has to provide, regardless of vendor.

  • Persistent state storage. Conversation history, token positions, tool call inputs and outputs, and agent state must be stored against a stable session ID and survive connection drops. The session ID is the anchor: the same session must be addressable after any reconnect, from any device.
  • Offset-based replay. A returning client requests messages from its last received serial. The infrastructure delivers everything missed, in order, with no duplicates. The client supplies its offset; the infrastructure fills the gap.
  • Protocol fallback. When a WebSocket upgrade is blocked by a proxy or firewall, the transport degrades to HTTP streaming or long-polling automatically. This should not require per-deployment configuration.
  • Multi-device delivery. Any authenticated device subscribing to a session ID receives the current state plus history. The session is not bound to the tab, browser, or device that opened it.

How Ably AI Transport solves the session layer problem

Thankfully, you don't need to build the infrastructure. Ably AI Transport is the durable session layer. The thing that makes the user experience survive what the WebSocket protocol cannot. The session lives in Ably - your application talks to it.

Channel-as-session diagram: agent publishes tokens and events, clients subscribe from any device and catch up on reconnect

The five failures raised in this article each map directly to a capability:

Connection drops from proxy timeouts, mobile handoffs, and page reloads. The transport degrades automatically - WebSocket first, then HTTP streaming, and then long-polling. So the session survives the infrastructure defaults that break standard WebSocket connections. No per-deployment configuration required. 

Reconnection and recovery

Tokens generated while the client was disconnected. The token stream is stored against the session. On reconnect, the client receives everything it missed in order, with no duplicates. The developer doesn't track offsets or implement catch-up logic. 

Token streaming

Tool call results and agent context lost mid-task. Agent state, tool call inputs and outputs, and conversation history are all published to the session as they generate. A reconnecting client recovers the full context, not just the tokens. 

Reconnection and recovery

Mid-stream steering and human-in-the-loop signals. Cancellations, approvals, and human input travel back to the agent on the same session channel. The bidirectional requirement that rules out SSE is covered without a separate signaling mechanism. 

Human in the loop

Sessions tied to a single tab or device. Any authenticated device subscribing to the session ID receives current state plus history. A conversation started on desktop continues on mobile without restart. 

Multi-device sessions

Get started: Vercel AI SDK · Core SDK

Frequently asked questions

When is SSE still the right choice for AI chat?

SSE is a reasonable starting point for chatbots that follow a simple request-response pattern: a user submits a message, the server streams tokens, no interruption required. It deploys more easily than WebSockets, carries no persistent connection overhead, and works well on stable networks.

The constraints appear when your application starts adding agentic behavior: tool calls, mid-stream cancellation, multi-device continuity, and background tasks that complete while the user is offline. At that point, SSE’s unidirectional architecture stops being a trade-off and becomes a blocker.

What timeout values should I configure to prevent AI connection drops in production?

Set your AWS ALB idle timeout to at least 3,600 seconds for WebSocket connections. The 60-second default was designed for HTTP requests, not long-running agent tasks. On Cloudflare Free and Pro plans, the WebSocket timeout is fixed at 100 seconds. Send heartbeat pings at around 25-second intervals to stay well below that threshold.

For Nginx, the equivalent setting is proxy_read_timeout. These three changes cover most production timeout failures for AI chat deployments.

Does reconnection logic solve the session recovery problem?

Reconnection logic solves the transport problem. It doesn’t solve the state problem. Exponential backoff and heartbeats re-establish the socket.

But they can’t recover tokens generated during the gap, tool call results that arrived while the client was disconnected, or context accumulated across multiple steps. Preventing duplicate messages on reconnect requires sequence numbers or idempotency keys at the session layer, not the WebSocket layer. A client that reconnects without a session layer arrives at an empty context and either loses the conversation or restarts it.

How does Ably replay missed messages after a WebSocket reconnect?

Ably assigns every published message a serial number. When a client reconnects, the transport layer uses the internal untilAttach mechanism to fetch messages published during the gap. This bounds the history query to the exact reconnection point.

Ably delivers everything missed in order, with no overlap between historical and live messages. The client doesn’t track its own offset or implement catch-up logic. Every plan includes two minutes of ephemeral history by default. Persisted channels extend this to 72 hours on Standard plans, or up to 365 days on Pro and Enterprise plans.