Does your AI stack need a session layer? A maturity framework for teams building AI agents

Most teams building AI agents start with HTTP streaming. It's the right starting point. Every major agent framework defaults to it, it gets tokens on screen fast, and for a single-user prompt-response interaction it works well.

The question is when it stops being enough - and how to recognise that before it turns into user experience problems, engineering waste, and technical debt that constrains what your product can do.

Over the past year, I've spoken with more than 40 leading AI companies building production agent applications - chat assistants, customer support agents, copilots, research tools. The pattern is consistent: teams that start on HTTP streaming hit a predictable set of walls as their product matures. Some hit them at hundreds of users, others at thousands, but the walls are the same.

This article maps those walls to a maturity curve. Not every team needs a dedicated session layer. But every team benefits from knowing where the transition point is.

The AI agent maturity curve

AI agent products tend to follow a progression. Each stage works - until the next set of requirements arrives.

Stage 1: Prompt-response (HTTP works fine)

What it looks like: User sends a message, agent streams back a response, interaction is complete. Single turn or simple multi-turn with history stored client-side or in a database.

Architecture: Direct HTTP streaming or SSE from agent to client. Session state in the browser tab. Stateless backend.

This works because: The interaction is short, single-device, single-participant. The connection lasts seconds. If it breaks, the user just asks again.

What teams build: A working demo, a beta product, early users. Engineering time goes to the model, the prompts, the tools. Transport is not a concern because it hasn't failed yet.

When to stay here: If your product is genuinely single-turn prompt-response - a search interface with AI summaries, a one-shot code generator - you may never need to leave this stage. HTTP streaming is the right tool.

Stage 2: Agentic experiences (HTTP starts creaking)

What it looks like: Sessions that last minutes rather than seconds. Agents making tool calls, executing multi-step workflows, generating structured outputs. Users expect to see what the agent is doing - chain of thought, reasoning, tool call progress. Copilots sharing state with the user's application. Multiple agents coordinating on the same task, with one starting and another completing the work.

Architecture: Still HTTP streaming, but with growing amounts of custom infrastructure around it. Background workers to decouple generation from delivery. Message buffers or databases for reconnection. Custom heartbeat endpoints so the client can guess whether the agent is still alive.

The five diagnostic questions for this stage:

What happens when the user refreshes the page, switches tabs, or changes device mid-stream? If the answer is "the response is lost and they have to ask again" or "we handle page refresh but not tab switch or device change" - you've hit the delivery gap. HTTP streaming couples the session to the connection. Any interruption - and tab switches and device changes are constant in real usage - means starting over.
Can your user see what the agent is doing while it's working? If a long-running task looks identical whether the agent is thinking, executing a tool call, or has silently crashed - you've hit the visibility gap. HTTP streaming is one-way. There's no mechanism for the agent to publish structured status (thinking, tool calling, waiting for approval) that the client can render in real time. And there's no way for the user to know if the agent is still alive without polling a separate endpoint - which itself requires something stateful to poll against, and agents on stateless infrastructure don't naturally provide that.
Can a user interrupt, redirect, or steer the agent while it's working? If the only option is to cancel the request (which kills the stream and loses the partial response) - you've hit the bidirectional gap. HTTP streaming is server-to-client only. Cancel, redirect, and steer all require a channel the client can publish to while the agent is streaming. Without it, the user is a passive observer of their own conversation.
Can an agent hand off to another agent, or to a human, without breaking the session? If agent-to-agent handover requires agents becoming proxies, or exporting state and importing it into a new session, or if human escalation introduces an entirely different technology stack - you've hit the handover gap. HTTP provides no concept of multiple participants on the same session. Each transition is a state transfer, and each transfer is a point where context drops.
How much engineering time is going to transport infrastructure? Not the agent, not the model, not the UI - the plumbing between them. Message buffers, reconnection logic, state serialisation, deduplication, custom presence, heartbeat endpoints. If that number is growing, you're building a session layer without calling it one.

The smell: You're spending engineering time on infrastructure that has nothing to do with what makes your product unique. The excitement is in the AI. The engineering hours are going to transport reliability. Teams are focused on the intelligence and delivering meaningful results, but the transport layer is getting in the way.

When teams typically notice: The first enterprise customer whose VPN kills the stream. The first user complaint about losing context when they switch tabs or devices. The first time a PM asks "can the user interrupt and steer an agent mid response?" and the answer is "not without building something custom."

Stage 3: Production-grade AI agents (HTTP doesn't reach)

What it looks like: Rich, collaborative AI experiences. Agents sharing state with the user's application in real time. Agent-to-client tool calls where the agent requests information from the user's device. Copilots with ambient awareness of what the user is doing. Human-AI handover where context transfers seamlessly - warm transfer where the agent introduces the human, cold transfer where the human picks up with full history. Multiple devices where users start on desktop and continue on mobile. Background agents that complete work while the user is away and notify them on completion. Enterprise requirements: audit trails, compliance, observability, cost controls tied to actual usage.

Architecture: The session is decoupled from both the connection and the agent process. Any authenticated participant - user, agent, human operator, supervisor - can join, catch up from where they left off, and see current state. The session persists independently of any single connection, device, or participant.

What changes at this stage: The five diagnostic questions no longer have workaround answers. Page refresh and tab switch work because the session has a persistent identity and offset - the client reconnects and catches up deterministically. Device switching works because the session lives in the infrastructure, not the tab. Agent visibility works because presence is a first-class primitive - crash detection is immediate, not inferred from a dead stream. Interruption and steering work because agents and devices can both communicate independently of the current stream, allowing things like agent-to-client tool calls and users interrupting or redirecting conversations. Handover works because the session model treats all participants equally. And engineering time shifts from transport plumbing to product features.

The architectural pattern: This isn't a novel architecture in its fundamentals. Durable sessions have existed for collaborative applications for years - real-time editors, multiplayer games, trading systems. What's more novel now is the combination: token streaming alongside collaborative state and presence, with offline capabilities like push notifications, all in one session layer. That combination is specific to AI agent applications and it's what makes purpose-built infrastructure valuable.

The session layer sits between the agent framework and the client. It owns:

Durable transport: reliable delivery with automatic reconnection and offset-based replay. Protocol fallback handles network conditions by default. Token streaming with exactly-once semantics.
Collaborative state: session state shared across all participants. What the agent has published, what the user has seen, what tool calls are in flight. Agents can make tool calls to the client, and the client can respond.
Multi-device fan-out: one publish from the agent, N subscribers across all surfaces. Any device connecting to the same session catches up from where it left off.
Presence: which participants are on the session right now. Is the agent active? Is the user connected? Has a supervisor joined? When an agent disconnects, that fires immediately as an event - no polling. Routing to the currently live agent becomes a session operation, not a separate service.
Bidirectional control: agents and devices can both communicate with each other independently of the current stream of information. Users can interrupt, steer, and redirect. Agents can request information from the client. All explicit signals, not side effects of closing a TCP connection.
Offline delivery: when the user is away, the session doesn't stop. Background work completes, and the user gets a push notification on completion. Three delivery paths from one session: streaming when connected, catch-up on reconnect, notification when offline.

How teams have navigated this

Intercom built Fin, one of the most sophisticated AI customer support agents in the market, on a capable internal pub/sub system. As Fin's capabilities grew - streaming responses, multi-step workflows, agent-to-human handoffs across their network - the requirements outgrew what that system was designed for. They adopted Ably AI Transport, a purpose-built durable session layer. The result: faster first-token delivery, more reliable streaming, and engineering time redirected from transport maintenance to higher-value product work.

HubSpot needed their AI copilots to share state between agents and across devices - something traditional HTTP-based transport couldn't support. They'd also built a dedicated token batching layer to manage delivery costs of streaming tokens at scale. With Ably AI Transport handling transport and state, those custom layers became unnecessary. Copilots now share state natively, and batching is handled at the infrastructure level.

Teams at Suno and Duolingo are running similar architectures in production - treating the session layer as infrastructure rather than building it as application code.

Signals that you're ready for a session layer

The diagnostic questions in Stage 2 point to specific capability gaps. But the broader signal is simpler: your transport layer is constraining what your product can do.

Transport-level signals:

Connections dropping and users losing context on refresh, tab switch, or device change
No visibility into whether the agent is alive, thinking, or crashed
Users unable to interrupt or steer the agent mid-stream
Engineering time going to reconnection logic, state serialisation, deduplication - infrastructure that has nothing to do with your product

Experience-level signals:

You want copilots that share state with the user's application but HTTP streaming can't support it
You want agent-to-client tool calls but there's no bidirectional channel
Human handover introduces a completely different technology stack and users lose context
Agent-to-agent handover requires exporting and reimporting state or proxying streams between agents
You want to show chain of thought, reasoning steps, and tool call progress in real time
Users on mobile or multiple tabs can't see the same session
Background tasks complete with no way to notify the user

Enterprise signals:

Audit trail requirements you can't meet with HTTP request logs
Observability gaps - you can't answer "what happened in this session?"
Compliance requirements (SOC 2, HIPAA) that need session-level guarantees

Signals that you don't need one yet

Your product is genuinely single-turn. Search with AI summaries, one-shot code generation, document analysis. If the interaction completes in a single exchange, HTTP streaming is the right architecture.

You have no multi-participant or multi-device requirements. One user, one device, one agent is your product model for the foreseeable future.
Your focus is still on the intelligence. You're tuning prompts, improving tool calls, getting the agent to produce the right output. The transport layer isn't your bottleneck yet, and that's fine - solve the intelligence problem first.

What to look for in a session layer

If you decide you need one, evaluate against the five diagnostic questions. A session layer should:

Resume deterministically - not "best effort" reconnection, but offset-based replay where the client catches up from exactly where it left off. Across page refreshes, tab switches, and device changes.
Decouple from both client and agent - session state lives in the infrastructure, not the tab and not the agent process. Any authenticated participant connects and sees current state.
Provide presence natively - agent health and user connectivity as first-class events, not application-level polling. Immediate disconnect detection, not timeout heuristics.
Support bidirectional communication - agents and devices can both communicate with each other independently of the current stream. Users can interrupt and steer conversations. Agents can make tool calls to the client. Routing to the currently live agent is a session operation.
Be framework-agnostic - the session layer shouldn't care whether you're using Vercel AI SDK, LangGraph, or something custom. If switching frameworks means rebuilding your transport, you've coupled the wrong layers.

Different approaches cover different parts of this. ElectricSQL's Durable Streams protocol provides stream resumability - that's a genuine piece of the puzzle. If you're on Cloudflare Workers, Durable Objects can provide session state for simpler architectures. Where you need the full picture - durable transport, collaborative state, presence, multi-device, bidirectional control, human handover, offline notifications, enterprise compliance - that's what a purpose-built durable session layer is designed for. That's what Ably AI Transport provides, and it's what teams like Intercom, HubSpot, Suno, and Duolingo are running in production.

The maturity decision

Building on HTTP streaming isn't the wrong approach. It's the right place to start. The question is whether you recognise the transition point - from "HTTP streaming is serving us well" to "we're building our own session layer out of message buffers, background workers, and custom state management" - and make the architectural decision deliberately rather than discovering it through degraded user experience and engineering waste.

The category forming around this layer is called durable sessions. It's emerging independently across the ecosystem: Vercel built a pluggable ChatTransport interface, TanStack shipped a ConnectionAdapter, ElectricSQL built a durable streams protocol, and companies like EMQX have been using the term for years in the IoT space. Each is an acknowledgment that session infrastructure is a separate concern from agent orchestration and model inference.

Ably AI Transport provides the full durable session layer - durable transport, collaborative state, presence, multi-device fan-out, bidirectional control, and offline delivery. It's framework-agnostic, drops into existing stacks, and handles the infrastructure that teams at Intercom, HubSpot, Suno, and Duolingo decided to stop building themselves. If the diagnostic questions in this article surfaced gaps in your stack, the docs are a reasonable next step: ably.com/docs/ai-transport.