AI Transport exists because the default pattern for interactive AI experiences limits the quality and richness of the interactions you can build, when using direct, streamed HTTP requests from clients and agents.
Most AI frameworks support simple client-driven interactions, with streamed responses from the agent via server-sent events (SSE) or similar HTTP streaming. The client's request is handled by an agent instance which pipes tokens in response to the client request. This approach is simple, surprisingly effective for simple interactions, and every framework supports it. However, the simplicity of the pattern is also the source of its limitations.
Limitations of HTTP streaming
There are several limitations associated with HTTP streaming that arise from the coupling between the client-agent interaction and the transport layer.
Streams fail on disconnection
The operation of a response stream is tied to the health of the underlying connection. When the connection drops, the response stream fails.
This happens all the time in practice. For example, when a phone switches from Wi-Fi to cellular, a user refreshes the page, a laptop lid closes mid-response. The LLM continues to generate tokens, but there's no way to deliver them to the client, so there's nowhere for them to go.
The SSE protocol does support a mechanism at the protocol level, for a reconnecting client to specify a position in the stream to resume from. However, this is usually not supported in practice because it would require a significant increase in complexity in the backend. To support resumable streams with SSE, you would need to assign sequence numbers to token events for ordering, buffer those events in an external store, and build a resume handler. This is a big departure from a simple, stateless request handler. Even having done that, you have only addressed a part of the problem; the solution would not support continuity of streams after a page refresh because that's not supported by SSE.
Sessions don't span devices
With HTTP streaming, the connection is exclusively between the requesting client and the agent. A second tab or a phone can't access that same stream. It only exists for the client that initiated the request.
In reality, users move between surfaces constantly, whether that's a second browser tab, or an app on their phone. Without shared access to sessions, each surface is isolated. There's no way for a new client to see the in-progress stream, the conversation history, or its current state.
Clients can't reach the agent
An SSE request initiated by the client is one-way: server to client. The client has no way to send a signal to the agent through the same connection once the initial request has been made. The only thing the client can do is to read the stream to completion, or cancel it by closing the connection.
Having cancellation as the sole way to signal to the agent creates a fundamental conflict. Take a 'stop' button that cancels an in-progress stream, for example. You could use request cancellation, but if a closed connection is interpreted as a cancellation then the LLM response would be suspended, and you lose the ability to resume the stream.
Even with a bidirectional transport between client and agent, such as WebSockets, the connection would still be an exclusive pipe. Other devices have no upstream channel to the agent, so you can't interrupt or steer from a second device.
Multi-agent architectures are complex
In multi-agent systems, an orchestrator handles the client's connection and delegates to specialized subagents. When there is an exclusive, point to point connection between the client and the orchestrator agent, all interactions with subagents must be proxied by the orchestrator. If you want users to see intermediate events, such as progress or responses, from subagents every update must be mediated by the orchestrator, adding complexity and coupling.
Durable sessions
These problems all stem from the coupling between client-to-agent interaction and the transport layer used to mediate that interaction. The transport, which includes the connection, request and streamed response, is ephemeral and only exists for the lifetime of that single interaction. It is also exclusive, so no other agent or client instance can interact with it.
The pattern that engineering teams are adopting to solve these problems is to break that coupling, through the idea of a durable session: a shared, persistent medium through which clients and agents interact, instead of an exclusive pipe between one client and one agent.
Using a durable session:
- The agent writes events to the session.
- Clients independently connect to the session.
- The session persists across connection drops, device switches, and can be resumed at a later time.
- Any participant can publish to the session, enabling bidirectional control.
This changes what's possible:
| Feature | Direct HTTP | Durable session |
|---|---|---|
| Resume after disconnect | Build from scratch: buffer, sequence and resume handler. | Automatic. Client reconnects and picks up where it left off. |
| Multi-device sync | Not possible without custom infrastructure. | Any device subscribes to the same session. |
| Cancel mid-stream | Close the connection (loses resume). | Publish a cancel signal. Stream and session survive. |
| Steer or interrupt | Requires a separate back channel. | Signal the agent through the session. |
| Multi-agent visibility | Route all updates through orchestrator. | Each agent publishes directly to the session. |
How AI Transport implements this
Ably AI Transport implements durable sessions on top of Ably channels. Ably channels provide the properties that a durable session requires:
- Any client or agent connects to the session by specifying a channel name.
- Messages on the channel outlive any single connection, device, or agent.
- Events are received by subscribers in the order that they were published, even if there are disconnections.
- A client that drops its connection automatically reconnects and picks up where it left off.
- Any participant can publish to the channel. Cancel, steer, interrupt can all happen through the same session.
- Multiple participants subscribe to the same channel, and every participant sees every event.
In addition to these channel properties, the AI Transport SDK adds:
- Turns that structure prompt-response cycles with clear boundaries, concurrent lifecycles, and scoped cancellation.
- A codec layer that maps between your AI framework's event types and Ably messages.
- A conversation tree that supports branching, edit, regenerate, and history navigation.
- React hooks for building UIs with streaming, pagination, and branch navigation.
- Adapters that drop into various frameworks; for example AI Transport can be used with Vercel AI SDK's
useChatwith one line of code.