AI Transport

WebSocket reconnection in AI agents: transport recovery vs. session recovery

AI agents go quiet mid-execution in a way standard apps don't - triggering timeouts at the worst possible moment. This piece covers why reconnection alone doesn't recover the session, and what actually fixes it.

WebSocket reconnection in AI agents: transport recovery vs. session recovery

Your AI agent is mid-task, waiting on the result of a search tool call it made 30 seconds ago. The user is watching a spinner. Then a network blip drops the connection. 

The application reconnects in under a second, fast enough that most monitoring wouldn't flag it. But the tool call result that came back during the gap is gone, and so are the 200 tokens the agent generated before the silence began.

The reconnect succeeded - but the session didn't.

This piece covers why reconnection issues are more difficult to anticipate for AI agents than standard WebSocket applications, and how to resolve both the transport and session recovery sides of the problem.

Key takeaways

  • AWS Application Load Balancer closes idle WebSocket connections after 60 seconds by default. An agent waiting on a tool call can go silent for longer than infrastructure timeout thresholds expect, making this the most common timeout source in production AI applications.
  • Transport reconnection re-establishes the WebSocket. It does not replay tokens generated during the gap, tool call results that arrived while the client was offline, or agent context accumulated during the disconnect.
  • Cloudflare proxies WebSocket connections with a 100-second idle timeout on Free and Pro plans. Configuring the WebSocket server to send ping frames every 50 seconds keeps connections alive below both the ALB and Cloudflare thresholds, with browsers responding automatically.
  • Session recovery requires a layer that stores what the transport cannot carry: the agent's in-flight output, tool call results that arrived during the gap, and the position in the ongoing generation.

Why AI connections time out differently in production

WebSocket reconnection isn't a new failure mode. It has always been a problem worth solving. What makes AI agents different is what triggers the disconnect.

A standard chat interface goes quiet between user interactions, when there's genuinely nothing happening. An agent goes quiet mid-execution: during tool call waits, between reasoning steps, while the LLM is generating a response. That silence is the agent doing its most intensive work - but to every load balancer and proxy in the path, it looks idle.

When a WebSocket connection carries no data for a defined period, the load balancer closes it. This is expected behaviour - AWS ALB defaults to 60 seconds, Cloudflare Free and Pro plans enforce 100 seconds - but neither threshold was configured with AI workloads in mind. The result is that connections drop not during inactivity, but mid-execution, when the cost isn't just a reconnect: it's lost context, dropped tool results, and a generation that can't resume.

Why SSE doesn’t fully solve this

When the connection drops, reconnecting via WebSocket or SSE restores the transport. SSE has a genuine advantage here: its Last-Event-ID header is a native catch-up mechanism that lets clients request events from a specific point, which is why many developers reach for SSE and consider the problem solved. But it isn't. 

Last-Event-ID handles the transport gap, but only if the agent's output was stored server-side in the first place. And it has to be stored in a format the client can map back to UI state, with deduplication handled for clients that reconnect mid-stream or reload the page. That's the session recovery problem, and it's a separate layer from the transport.

The infrastructure timeout sources that hit AI agents, and other connection challenges

The most common sources of idle timeout failures in production are AWS ALB, Cloudflare, and Vercel. Each has different thresholds and fixes.

AWS ALB: 60-second idle timeout

The AWS Application Load Balancer idle timeout defaults to 60 seconds. When no data crosses the connection in either direction during that window, ALB closes it silently. No FIN frame, no RST, no onclose event fired on the client. The connection appears open until the application attempts to send or receive.

For an agent waiting on a downstream API call, 60 seconds of inactivity is routine. The window closes during a database lookup, a web search, a third-party webhook, or a slow inference step.

Only one change is required on the server side. Configure your server to send WebSocket ping frames at an interval shorter than the ALB timeout. Browsers respond to ping frames automatically with pong frames, which count as activity and reset the idle timer. A ping interval of 50 seconds keeps connections alive below the 60-second ALB default without any client-side code. Raise the idle_timeout.timeout_seconds attribute in your ALB configuration if your workload requires longer windows. It is adjustable up to 4,000 seconds.

Server-side ping frames handle the ALB. For upstream proxies and CDN edge nodes you cannot configure, the same server-side ping approach applies; the pong response from the browser resets their idle timers as well.

Cloudflare: 100-second idle timeout

Cloudflare proxies WebSocket connections on all plans, but Free and Pro plans enforce a fixed 100-second idle timeout. The limit cannot be raised on those plans; Enterprise customers can configure a custom value through their account team. For applications behind Cloudflare CDN, any agent response that pauses for more than 100 seconds without sending data loses the connection.

The only lever on Free and Pro plans is the server-side ping interval. Configuring your WebSocket server to send ping frames every 50 seconds keeps connections alive - below both the Cloudflare limit and the 60-second ALB default. With browsers handling the pong response automatically.

If connections die at exactly 100 seconds regardless of application-level configuration changes, Cloudflare is the likely source. The EdgeStartTimestamp and EdgeStopTimestamp fields in Cloudflare’s HTTP request logs measure WebSocket connection duration from upgrade to close. A consistent 100-second pattern in those fields confirms the diagnosis.

Vercel: function limits and why WebSockets need a separate host

The constraint that matters for WebSockets on Vercel is structural, not configurable. Vercel serverless functions cannot hold a WebSocket connection open. Each function invocation terminates after it responds, and there is no persistent process to maintain the socket. This applies even with Fluid Compute enabled.

Vercel's function duration limits apply to SSE streaming and long-running HTTP responses: 300 seconds on Hobby plans and 800 seconds on Pro plans with Fluid Compute (the default since April 2025). They are not relevant to WebSocket connections, which Vercel cannot host regardless of duration.

Vercel's own documentation confirms this and recommends third-party providers for any application that needs persistent connections. For applications using Vercel AI SDK, the ChatTransport interface is the plug-in point for swapping the default HTTP transport for a WebSocket-based provider, without changing your agent code or UI.

For AI applications on Vercel that need WebSocket connections, you need a durable session layer, such as Ably. The AI Transport guide for Vercel AI SDK covers the full integration.

Other connection challenges to consider

Not all connection failures come from timeouts. Two other patterns hit AI agent applications in production and require different handling.

Corporate VPN and enterprise proxy traversal. Many enterprise networks do not forward the HTTP Upgrade header required to open a WebSocket connection, so the connection never opens (rather than dropping mid-session). The failure appears as a refused connection before any data flows, not a silent close after inactivity. A timeout produces an established connection that dies after a predictable interval; a proxy block fails at the WebSocket handshake stage, typically returning a non-101 HTTP response. The fix is protocol fallback: when a proxy blocks the WebSocket upgrade, the transport degrades automatically to HTTP streaming or long-polling without per-deployment configuration.

Mobile network handoffs. Switching from WiFi to cellular drops the underlying TCP connection immediately. The WebSocket closes with it, and the client’s onclose event does not fire: the OS terminates the connection without a clean close frame. On iOS, background TCP connections are suspended within seconds of an app moving to the background, again without notifying the client. The recovery pattern is the same as for any disconnect: reconnect and request messages from the last received serial. The key is not relying on onclose to trigger reconnection; use a combination of failed-send detection and an application-level heartbeat timeout to catch the cases where onclose was never fired.

What transport reconnection recovers, and what it doesn’t

Reconnecting the WebSocket connection restores the transport, but it doesn’t restore the state of the session that was in flight when the connection dropped. The distinction is worth stating precisely, because the failure looks like a transport problem, but its cost is a state problem.

What transport reconnection recovers

What it doesn’t recover

The WebSocket connection itself

Tokens generated while disconnected

Active session subscriptions

Tool call results that arrived during the gap

The ability to send and receive new messages

The agent’s reasoning trace if streamed as events

The session ID and session name

The position in the ongoing generation

After a successful reconnect with only transport-layer recovery, the client is back online, but the session is in an indeterminate state. The client holds a partial response from before the disconnect. The agent continued generating on the server side. Neither side knows where the other stopped.

The session recovery layer is the infrastructure needed to store what the transport layer cannot carry through a disconnect.

The session layer: storing and replaying in-flight state

This is where Ably AI Transport comes in. AI Transport is the session and delivery layer for AI applications. It sits between your agent and your users, handling the recovery concerns that would otherwise fall to application code.

Reconnection and recovery

AI Transport's reconnection and recovery is built for the interruption patterns that agents encounter in production: brief network drops, mobile handoffs, page reloads mid-stream, and users returning to a conversation after going offline. The agent keeps publishing regardless of client connectivity. When the client reconnects, AI Transport delivers everything the client missed, in order, without the agent having to regenerate anything.

History and replay

AI Transport's history and replay ensures clients catch up on everything they missed regardless of how long they were offline. It works because tokens are stored as appends to a single message per agent response rather than as individual token events. When a client reconnects or refreshes, it receives one clean, accumulated message per response and resumes from there, with no reconstruction logic required. 

What the user should see during a disconnect

Storing and replaying in-flight state is the infrastructure side of session recovery. The other side is what you surface to the user while it's happening - because a reconnect that works silently in the background still needs the right UI treatment to avoid looking like a failure.

AI Transport exposes well-defined connection states. The key distinction is between the disconnected state (temporarily offline, retrying automatically), and the suspended state (retry window exhausted). During disconnection, a reconnecting indicator is shown (as opposed to an error modal). In a suspended state, a retry button is shown to communicate that the session is intact and waiting.

AI Transport connection state machine: connecting, connected, disconnected, suspended

Ably AI Transport is the session recovery layer

Building session recovery without AI Transport means writing a heartbeat loop, a reconnection manager, manual state reconstruction logic, and a connection state component to surface each phase to the user.

None of these is large in isolation, but together they constitute infrastructure. And any infrastructure that your team owns is infrastructure your team spends time and resources maintaining and extending as requirements change.

Ably AI Transport provides the session recovery layer:

  • Automatic connection recovery within the two-minute window
  • History compaction and replay so clients always receive clean, accumulated state on reconnect
  • Protocol fallback from WebSocket to HTTP streaming to long-polling
  • Bidirectional signaling on the same session

What remains in your application code is the connection state UI (surfacing the reconnecting and suspended states to the user), and that’s a handful of lines rather than a system.

Frequently asked questions

How do I stop AI chat sessions from timing out?

Configure your WebSocket server to send ping frames at an interval shorter than the ALB timeout. A 50-second ping interval keeps connections alive, and below both the 60-second ALB default and Cloudflare's fixed 100-second limit on Free and Pro plans. Browsers respond to ping frames automatically with pong frames, so no client-side code is required. If your workload needs a longer window, raise the idle_timeout.timeout_seconds attribute in your ALB configuration; it is adjustable up to 4,000 seconds.

What happens if a user disconnects during LLM streaming?

With AI Transport, the session resumes automatically upon reconnect, with missed tokens delivered in order before new ones arrive, and no application code needed. For longer disconnects, AI Transport's history and replay feature loads the full conversation from the session history. Without a session layer, tokens generated during the dropout are lost, and the agent cannot resume from the point of interruption.

How do I avoid duplicate AI messages after a WebSocket reconnect?

With AI Transport you don't need to - the SDK handles this through history compaction. Tokens are streamed as appends to a single message per agent response, and the session history stores one message per response rather than one per token. When a client reconnects or refreshes, it receives the single accumulated message rather than individual tokens to reconstruct.

What is the AWS ALB idle timeout, and how do I raise it for WebSocket connections?

The AWS Application Load Balancer idle timeout defaults to 60 seconds and applies to all connection types, including WebSocket. Raise it by updating the idle_timeout.timeout_seconds load balancer attribute. The valid range is one to 4,000 seconds; most AI agent workloads are well served by a value between 3,600 and 4,000 seconds. The change takes effect immediately without requiring a redeployment.

Does Cloudflare close WebSocket connections? What is the timeout?

Yes. Cloudflare enforces a 100-second idle timeout on WebSocket connections for Free and Pro customers. The limit is fixed on those plans and cannot be raised. Enterprise customers can configure a custom value through their account team. To keep connections alive on Free and Pro plans, configure your WebSocket server to send ping frames every 50 seconds. Browsers respond automatically with pong frames, which reset Cloudflare's idle timer and the 60-second AWS ALB default simultaneously.

Can WebSockets work behind a corporate VPN or enterprise proxy?

They can, but many enterprise proxies do not forward the HTTP Upgrade header required to open a WebSocket connection. When that happens, the connection fails at the handshake stage rather than dropping mid-session. That failure is distinct from a timeout: the error occurs before any data flows, not after a period of inactivity. Protocol fallback to HTTP streaming or long-polling handles proxy blocking at the infrastructure layer without per-deployment configuration.

How long does Ably retain channel history for session recovery?

AI Transport replays missed messages automatically on reconnect, with no application code needed. For longer disconnects, session history loads the full conversation, persisting for 24 to 72 hours depending on your Ably plan, with extended retention available on higher tiers.