TL;DR Temporal is a control plane: it makes backend workflows crash-proof through durable execution. Temporal shipped Workflow Streams in Public Preview for Python at Replay 2026, which adds streaming from Python workflows to connected clients. It does not replay missed events when users reconnect, support multiple devices, notify offline users, or cover TypeScript and JavaScript. Durable sessions is the session layer you build alongside Temporal, not inside it.
Temporal solves a hard problem well. When an AI agent crashes mid-task, durable execution picks up from the last checkpoint without repeating work. The backend state is always consistent and recoverable.
The question that follows every Temporal adoption is the same: how do results get to the user's screen? Temporal doesn't answer that question, and its own team says so plainly.
What Temporal handles
Temporal makes backend logic crash-proof through event sourcing and checkpointing. It records every completed activity. If the process dies, a new instance reads the history and continues.
Long-running workflows (days, weeks, months) survive infrastructure failures without losing position. Human-in-the-loop patterns work because a paused workflow can receive a signal hours or days after the pause.
This is the control plane: the authoritative record of what the agent is doing and what it has done.
What Temporal doesn't handle
Temporal is a control plane by design. It records what the agent executed. Getting those results to a browser is outside its scope, and its maintainers say so directly.
Pushing workflow state to frontends. The official guidance from the Temporal team has been unambiguous: "Currently there's no built-in way to do this only with Temporal." When asked how to stream LLM activity responses to a custom UI, the canonical response was: "You need some service in the middle which the frontend app connects to in order to receive messages."
Temporal shipped Workflow Streams in Public Preview at Replay 2026. The feature uses Signal and Update primitives to stream token batches from Python workflows to connected clients. It is a contrib module, currently Python-only, and cross-language support is on the roadmap.
Workflow Streams does not address multi-device fan-out, push notifications for offline users, or session durability when clients disconnect. TypeScript and JavaScript developers have no equivalent today.
Buffering events for reconnecting browser clients. When a user's browser disconnects mid-workflow and reconnects, there is no replay. The workflow state is fine. The user sees nothing of what happened while they were gone.
Delivering to multiple simultaneous observers. Temporal's message passing primitives are point-to-point between a client and a specific workflow execution. Multiple browser tabs, devices, or users watching the same workflow must each independently poll or query. There is no broadcast primitive.
Notifying offline users when workflows complete. When an agent finishes a long task and the user has navigated away, there is no built-in notification mechanism. The result sits in workflow state until someone queries it.
Streaming LLM tokens with appropriate latency. Temporal's minimum workflow step latency is around 100ms, and a single activity adds approximately 120ms of overhead. High-frequency token streaming hits both this latency ceiling and the 50MB event history limit. Temporal's documentation acknowledges "there are valid objections for Temporal to be in the hot path for interactive user transactions."
The architectural reason these are separate
Durable execution and durable sessions are built around different state models, and that separation is intentional. Temporal keeps the backend job alive. What it doesn't address is the channel between your agent and your user.
Temporal records each completed step permanently, and you can replay from any point. This immutability is the whole point: it is what guarantees consistency after a crash. Temporal is optimized for recording what the agent did, step by step.
Session state is mutable. A streaming response builds up token by token and then becomes a complete message. A partial response gets rewritten cleanly when the agent retries.
These models need to stay separate. A user's stop request needs to reach the frontend immediately but shouldn't appear in the execution history. Multiple devices watching the same agent need fan-out that the execution model doesn't represent.
Some teams try to store conversation state inside the Temporal workflow. The 50MB history limit gets consumed by conversation events faster than expected. One developer's polling workflow exhausted the 5,000-event limit in around three days.
The immutable execution log can't represent message edits, deletions, or partial stream rewrites. Fan-out to multiple devices still requires a separate delivery layer regardless.
Your options alongside Temporal
Teams working on this problem keep arriving at the same four approaches.
Polling the query handler
This is the simplest starting point. Your workflow exposes state via a query handler and the frontend calls it on a timer. For low-traffic applications with infrequent updates, it works.
At scale the cost mounts. At 1,000 active users polling every two seconds, that's 500 requests per second before any workflow work starts. Temporal Cloud prices by actions and the visibility API has rate limits that degrade under this load.
Redis pub/sub
A common next step: the workflow publishes tokens to Redis and a relay layer forwards them to SSE endpoints. This decouples generation from delivery and handles simple single-device cases where the user stays connected.
Fire-and-forget delivery means events published during a disconnect are gone. The developer at Architecting Bytes who built this architecture in production was direct: "This is NOT a durable delivery format, it is purely fire-n-forget." There is no catch-up for reconnecting clients.
Sticky sessions
Routing every request from the same user to the same server preserves whatever is in memory. When that server restarts or you scale out, the in-memory state goes with it. The user reconnects on a different machine that has no record of what the agent was doing. Sticky sessions are server affinity, not session persistence.
Custom WebSocket server
For a bidirectional channel and persistent connections, teams move to dedicated WebSocket infrastructure outside Temporal. You get a proper two-way channel, but connection management, reconnection logic, and lifecycle handling all become yours to own.
Temporal co-founder Maxim acknowledged the gap directly on the community forum: "There is no ideal solution yet. In the longer term I think we want to provide direct websocket support by the platform." Most teams that go this route end up rebuilding most of a realtime platform.
Purpose-built delivery layer
A delivery layer designed for agent workflows handles session durability alongside Temporal's execution durability. One publish from a Temporal activity reaches all subscribed clients. Reconnecting clients catch up from their last position and offline users get push notifications when workflows complete. The channel persists across Temporal workflow restarts. Together, Temporal's execution durability and the delivery layer's session durability cover the full stack.
What a session layer provides
The right architecture separates the concerns. Temporal manages execution state: the authoritative, immutable record of what the agent did. A session layer manages session state: the mutable, streamable, multi-device view of what the user sees.
When a workflow activity completes, it publishes the result to a session channel. Any client subscribed to that channel receives it. A user who disconnects and reconnects gets caught-up from their last position.
A second device joins the same channel and receives the same stream. When the agent crashes and Temporal restarts it, the new instance checks what was already published and resumes. The user sees continuation, not duplication.
Which path is right for your situation
Scenario | Durable sessions? | Why? |
Server-to-server batch, no user-facing component | No | Temporal handles this entirely |
Simple single-user workflow, user stays on the page | Minimal | Polling or Workflow Streams (Python) covers the basics |
Long-running agent, user may navigate away | Yes | Notifies user on return, on any device |
Human-in-the-loop approval flows | Yes | Pushes approval request to the right person across devices |
Live progress visibility during multi-step agents | Yes | Real-time updates without polling workers |
Agent cancel or steer mid-run | Yes | Delivers control signals in milliseconds, not on next workflow turn |
Multi-device or multi-tab continuity | Yes | Live stream that follows the user to a second device — ably.com/ai-transport |
What to look for in a session layer for Temporal
Most implementations cover one or two of these properties. All four together is what separates a durable session from the workarounds that break in production.
Persistent. Messages generated while a participant is disconnected need to be buffered and replayed in order when they reconnect. Any connection drop becomes a data loss event without this. The session layer should buffer events and replay them in sequence from the client's last position, not just return current state. →Reconnection and recovery
Addressable. The session needs a stable identity that both sides can reconnect to, across devices and connection resets. An ephemeral connection ID that disappears when the connection closes doesn't cut it. The channel should persist independently of any single workflow execution, so when Temporal restarts a crashed workflow, the session continues. → Sessions concept
Presence-aware. Both sides need to know when the other connects or disconnects. Without this, agents burn inference on responses nobody is receiving, and UIs show activity indicators for agents that stopped responding. These should be first-class state transitions, not conditions inferred from timeouts. → Agent presence
Multi-participant coherent. Agent and user can reconnect in any order, from any device. The session stays consistent regardless of which side comes back first. A device that joins mid-run gets the full history from the beginning. This is the property most production AI systems don't have today. → Multi-device sessions
Ably AI Transport provides a session layer designed for Temporal and other durable execution frameworks. It handles resumable streaming, multi-device delivery, and push notifications for offline users. Visit the Ably AI Transport overview, read the documentation, or sign up free to start building.
Recommended Articles
Vercel AI SDK ChatTransport: implementing a custom WebSocket transport
ChatTransport in Vercel AI SDK 5 lets you replace the default HTTP transport with WebSockets. Application code, agents, and UI stay unchanged.
Durable sessions for Vercel AI SDK applications
Vercel AI SDK's SSE transport breaks in production: proxy buffering, no reconnect, serverless limits. ChatTransport makes it swappable. Options compared.
WebSockets on Vercel: why serverless functions can't host them
Vercel serverless functions can't host WebSocket connections, even with Fluid Compute. Options and how to connect a WebSocket provider to Vercel AI SDK.