AI agent streaming in action: barge-in, human handover, and session continuity

You're mid-conversation with an AI support agent. You've explained the problem, the agent is halfway through a response, and the connection drops. When you reconnect, the response is gone.

You type the same question again. The agent asks the same clarifying questions again. Three minutes of context, gone. Not because the model forgot it, but because the delivery layer stored nothing.

Connection drops, page refreshes, and device switches all fail for the same reason: session state lives in the delivery connection, not independently of it.

Ably AI Transport fixes this by storing the session in a channel that outlasts any individual connection. The demo below showcases how this is achieved for the primitives that most production teams end up building from scratch - barge-in, human handover, and multi-agent coordination.

Key takeaways

Connection drops restart most AI streams from scratch. Ably AI Transport buffers session output in the channel, so clients reconnect and catch up without re-running inference.
Barge-in requires a bi-directional channel. Server-Sent Events (SSE) can't distinguish a user interrupt from a network drop; AI Transport delivers cancel and redirect as explicit channel signals the agent acts on.
Organization-side human handover, where a supervisor joins a live session on a different device hours later, is the HITL case most frameworks leave unsolved. AI Transport's durable session persists the pending approval in channel history until the right person responds.

In this video, Mike Christensen (Pub/Sub team lead at Ably), walks through all of these primitives in a live multi-agent holiday planning app. The sections below follow the same chapter structure as the video.

Why AI agent streams break in production

Connection drops mid-stream. Standard HTTP streaming stores no session state server-side. When the connection closes, the tokens generated during the gap disappear: the delivery layer was never asked to hold them. The client reconnects to an empty state and re-prompts.

Page refresh loses the stream. Most AI implementations store token state in the browser: React component state, a JavaScript variable tracking the partial response. When the page reloads, that state is gone. The agent has no awareness that the client disappeared mid-generation, and no mechanism to re-stream output that it already produced.

Device switches lose the session. Sessions are tied to connections, and connections are tied to devices. Move from laptop to phone and the conversation doesn't follow. The new device has no path to the session's history.

All three share the same root cause. Generation state is coupled to a single delivery connection. Decoupling them, by storing the session in a channel that outlasts any individual connection, is what fixes all three at once.

How Ably AI Transport handles connection recovery and session continuity

Server-side buffering and offset-based replay. Every token the agent publishes goes to the session channel as it's generated, regardless of whether the client is connected. On reconnect, AI Transport uses untilAttach to deliver everything published during the gap, in order, before the live stream resumes. The LLM never re-runs; the client catches up.

Session on the channel, not the connection. The session lives in the channel, not in the connection that opened it. Any device subscribing to the same channel name joins the same session: full conversation history, followed by the live stream from its current position. Two browser tabs, a laptop and a phone, a page reload mid-response: all receive the same unbroken state.

Channel history for context. When a client has been offline beyond the live recovery window, channel history provides the full conversation. Clients load older messages using view.loadOlder(), paginating back through the session until they have the full context. For users who are offline entirely, push notifications via FCM, APNs, or Web Push can deliver agent completions when they return. Push notification delivery is currently Partial in the feature set.

In the demo, Mike refreshes the page mid-stream, and the response picks up exactly where it stopped. Two windows open side by side show the same in-progress response, updating simultaneously.

Session continuity is the infrastructure layer. What happens on top of it: how users interact with agents in motion, how human operators step in, how multiple agents coordinate, depends on it being in place.

The next four sections cover the interaction patterns the demo demonstrates: what the user sees while the agent works, how they interrupt or redirect it, how a human operator takes over with full context, and how multiple specialised agents surface progress independently. All four require the session to be live and visible.

Agent progress visibility: what the user sees while the agent works

A user can only meaningfully interrupt an agent they can see working. Progress visibility is the prerequisite for both barge-in and human handover. Without visibility, users have no basis for interrupting: they're canceling a process they can't see, with no information about whether to wait or redirect.

The demo surfaces four types of progress signal. Token streaming shows what the orchestrator is generating. Ably LiveObjects carries the structured progress state from each of the three specialist agents: flights, hotels, and activities. Presence shows which agents are active in the session, and task history shows what each has completed.

Each signal comes from a different source, and each arrives independently. All three specialist agents publish their progress directly, without routing through the orchestrator. So the user sees the live state from each agent simultaneously. Each agent also converts its raw query parameters into natural language using a separate model call: progress cards show "Searching for direct flights on the 14th" rather than a query object. That's what makes barge-in useful. The user's decision to interrupt is based on accurate realtime information, not a stale snapshot.

Barge-in: how users interrupt and redirect agents mid-response

In Ably's customer discovery research - which Ably's CEO, Matthew O'Riordan, walks through in this talk - interruption emerged as a critical piece of functionality once teams moved to asynchronous agent experiences. One team disabled user input entirely due to the limitations of SSE, where a user's stop signal looks identical to a network drop - offering no safe way to act on it.

AI Transport changes this because the channel is bi-directional. User input arrives as a specific channel event, not a connection side effect, so the agent can act on it reliably while remaining live.

Two patterns are available, and the choice depends on what you want the user to see.

Cancel-then-send is the more common of the two. Call transport.cancel() and it publishes an explicit cancel signal on the channel: the server's abort fires, the LLM stream stops, and the turn ends with reason 'cancelled'. The session stays intact and the next message starts a clean turn. In the demo, Mike says "I want to visit a museum" while the activities agent is mid-search, the kind of redirect where there's no value in letting the original task finish. transport.cancel() fires, the search stops, and the agent starts fresh on the museum query.

Send-alongside is the alternative. It sends a new message without canceling the active turn, so both run concurrently: the agent continues the first response while processing the new input. You can cancel a specific turn using transport.cancel({ turnId }) if needed. Send-alongside is appropriate when you want the user to see both responses. For example - a clarifying follow-up while the agent is finishing its response, or a comparison query where both outputs are useful.

For the full API reference, see the Interruption and barge-in docs.

Human-in-the-loop: getting full session context to an operator on any device

Most frameworks implement one variant of human-in-the-loop (HITL) and leave the other unsolved. But the distinction between them matters in production.

User-side HITL is the pattern where the agent pauses and asks the user to approve an action before executing. For example, "Should I book this flight?". The user approves or rejects, and the agent continues. Almost every agent framework has this.

Organization-side HITL is the harder case. The agent needs to escalate to an internal supervisor: someone who may be on a different device, in a different time zone, and who might not respond for hours. This is the customer support scenario: a human agent takes over mid-conversation, with full context, without the user re-explaining anything. Most frameworks leave this unsolved.

AI Transport handles both through the same mechanism. The agent defines a tool that pauses for human input rather than executing automatically. When the LLM decides it needs approval and invokes this tool, AI Transport stops the turn and publishes the pending request to the channel as a durable message.

Any connected client sees it and can resolve it by calling view.update(). A supervisor joining on a different device hours later sees the same pending request in the channel history.

The approval is a durable channel message, not a live server process waiting to time out. Calling view.update() triggers a continuation turn, and the agent picks up where it paused.

Organization-side escalation is available today; implementation guides are being finalized.

For the full implementation detail, see the Human-in-the-loop docs.

Multi-agent coordination and shared state via Ably LiveObjects

Routing all agent activity through a central orchestrator creates a bottleneck. Every progress update has to pass through the coordinator before it appears to the user. At the scale of a multi-step, multi-agent workflow, that lag accumulates.

This demo takes a different approach. The orchestrator delegates to three specialist agents: flights, hotels, and activities, all running concurrently. Each specialist publishes its progress directly to Ably LiveObjects - bypassing the orchestrator entirely for user-facing updates.

The orchestrator waits for final results. The user sees live progress bars from all three agents updating in realtime, independently.

LiveObjects carries more than progress signals. User selections (flight, hotel, and activities choices) are written to LiveObjects state the moment the user makes a choice. When the user later asks "What's my current itinerary?", the orchestrator reads directly from LiveObjects rather than reconstructing context from chat history. If the user deleted a selection outside the chat thread, the agent sees that immediately. The conversation is one interface to the system; the source of truth is the state.

This matters because the user-facing update rate is decoupled from the orchestrator's coordination cycle. Each agent surfaces progress as fast as it produces it, with no relay step in between.

And presence adds a further signal. Agents can check whether the user is actually connected before streaming. An agent completing a search while the user is offline can push a notification rather than stream into a disconnected channel.

You can learn more about Ably LiveObjects here.

Session continuity, barge-in, and human handover aren't features that sit on top of an AI stack. They're properties of the delivery layer underneath it. The session channel is what makes them composable: the same mechanism that replays tokens on reconnect makes a pending approval durable, and lets a supervisor join a live conversation hours after it started. Most teams reach for these patterns eventually. The question is whether you build them yourself or start with infrastructure that already has them.

Docs go deeper: Ably AI Transport documentation.

Frequently asked questions

How do I implement barge-in so a user can interrupt an AI agent?

You need a bi-directional channel. Call transport.cancel() to publish an explicit cancel signal; the server's abort fires, and the LLM stream stops. This is distinct from a network drop, which HTTP and SSE cannot tell apart from an intentional interrupt, because it's a specific channel event the agent receives and acts on.

When should I use cancel-then-send versus send-alongside?

Cancel-then-send is the right choice when the user's new input supersedes the current task. In the demo, Mike's "I want to visit a museum" is a clear redirect: the original activity search is no longer relevant, and there is no reason to let it finish. Send-alongside is for cases where you want the user to see both responses: a follow-up question while an agent is finishing a detailed answer, or a comparison query where both outputs are useful. Use cancel-then-send as your default; reach for send-alongside when concurrent outputs are genuinely the goal.

What's the difference between user-side and organization-side HITL?

User-side HITL is the "should I book this?" pattern: the agent pauses and the user approves or rejects. Every agent framework handles this.

Organization-side HITL is different in kind. The approver is someone other than the user: an internal supervisor or compliance reviewer who may not be connected to the session at all when the pause happens. Most frameworks have no answer for this case, because it requires durable state: the pending approval has to outlast the original session and be retrievable by someone who arrives hours later. In AI Transport, the pending tool call sits in channel history until someone resolves it. That is the pattern this article demonstrates.

How does a supervisor access the pending HITL approval if they're not already connected to the session?

The pending tool call is in channel history, not in a server process waiting to time out. When the supervisor subscribes to the channel, on any device, at any point after the turn paused, they receive the full conversation history including the pending approval request. No separate notification is required, though push notifications via FCM or APNs can alert the supervisor that approval is waiting. Once subscribed, the supervisor calls view.update() to provide the tool result and trigger the continuation turn.

How do multiple specialized agents publish progress updates to the same user session?

Each agent publishes directly to the session channel or Ably LiveObjects, without routing through the orchestrator. The user sees live updates from all agents simultaneously; the orchestrator only handles final results. This decouples the user-facing update rate from the orchestrator's coordination cycle.

Does Ably AI Transport work with any LLM or agent framework?

Yes. AI Transport operates at the session and delivery layer, below the orchestration layer, so it has no dependency on a specific LLM provider or agent framework. In practice, if you are already using LangGraph or CrewAI, AI Transport sits underneath your existing orchestration: your agent logic stays unchanged, and AI Transport handles the channel, session state, and delivery mechanics. Integrations are available for the Vercel AI SDK. Other frameworks use the core Ably SDK directly.

AI agent streaming in action: barge-in, human handover, and session continuity

Key takeaways

Why AI agent streams break in production

How Ably AI Transport handles connection recovery and session continuity

Agent progress visibility: what the user sees while the agent works

Barge-in: how users interrupt and redirect agents mid-response

Human-in-the-loop: getting full session context to an operator on any device

Multi-agent coordination and shared state via Ably LiveObjects

Frequently asked questions

How do I implement barge-in so a user can interrupt an AI agent?

When should I use cancel-then-send versus send-alongside?

What's the difference between user-side and organization-side HITL?

How does a supervisor access the pending HITL approval if they're not already connected to the session?

How do multiple specialized agents publish progress updates to the same user session?

Does Ably AI Transport work with any LLM or agent framework?

Continue reading

Conversation tree branching in @ably/ai-transport

The model is fine. The session is broken.

Engineering message appends for AI Transport: three vignettes

Key takeaways

Why AI agent streams break in production

How Ably AI Transport handles connection recovery and session continuity

Agent progress visibility: what the user sees while the agent works

Barge-in: how users interrupt and redirect agents mid-response

Human-in-the-loop: getting full session context to an operator on any device

Multi-agent coordination and shared state via Ably LiveObjects

Frequently asked questions

How do I implement barge-in so a user can interrupt an AI agent?

When should I use cancel-then-send versus send-alongside?

What's the difference between user-side and organization-side HITL?

How does a supervisor access the pending HITL approval if they're not already connected to the session?

How do multiple specialized agents publish progress updates to the same user session?

Does Ably AI Transport work with any LLM or agent framework?

New posts from the Ably team, monthly.

Continue reading

Conversation tree branching in @ably/ai-transport

The model is fine. The session is broken.

Engineering message appends for AI Transport: three vignettes