Going to production

Before you ship AI Transport in production, walk through this checklist. It covers the operational concerns that come up across every feature.

This page is the production checklist for AI Transport. Work through each item before traffic moves to production. Where a topic has detail beyond what fits inline, the relevant link is in the item.

Each feature page also has its own edge cases and unhappy paths section; this page covers the cross-cutting concerns.

Pricing

AI Transport runs on Ably's usage-based billing. Costs scale with published and delivered messages, channel time, and connection time. The exact rates depend on your package; contact Ably for enterprise pricing.

Two AI Transport patterns affect cost predictably:

Token streaming uses Ably's append rollup to compact many token events into fewer published messages. The default rollup window is 40 ms (about 25 messages per second per stream). See token streaming for tuning.
Channel history retention determines how long the session is hydratable from Ably alone. Longer retention costs more storage; pair with an external store for long-lived conversations.

For a worked example, see the AI chatbot pricing example.

Limits and quotas

The platform limits page covers the hard limits in detail. The ones that matter most for AI Transport:

Connection inbound message rate. A single stream that exceeds the per-connection rate triggers rate-limiting. Use the append rollup window to control the publish rate. See token streaming.
Channel message rate. Concurrent turns and multi-agent setups share the channel's rate budget.
Message size. Large tool outputs or attachments approach the per-message size limit. Stream large results or persist them externally and reference the URL.
Channel history retention. Configure retention through the channel namespace settings. Plan for the longest session you expect to hydrate from Ably alone.

Monitoring and observability

Watch four signals:

Channel publish rate per session, against the connection's inbound rate limit.
Turn lifecycle outcomes (completed, cancelled, failed). A rising failure rate is your first signal that something is wrong with model orchestration, auth, or capacity.
Server-side LLM latency and error rate, from your model provider. The Ably side is decoupled from the LLM; track them separately.

Wire these into your existing dashboards through Ably's integrations. Webhooks, Kafka, and other sinks expose channel activity. Ably also offers Stats API for usage data.

Auth hardening

The defaults are safe for development. For production:

Scope channel capabilities to the specific conversation channel (or the namespace) the user needs access to. publish, subscribe, and history only on channels the user owns.
Set clientId in the token, not on the client. The Ably service verifies tokens; client-set IDs are unverified and spoofable.
Use short token lifetimes and authUrl (or authCallback) so the SDK refreshes tokens automatically. One-hour tokens are a reasonable default.
Implement token revocation for forced sign-out.
For cancel authorisation, set the onCancel hook on every turn. Reject cancels from clients that do not own the turn unless you specifically want admin-style global cancels.

See Set up authentication for the recipes and authentication concept for the model.

Data shipping and retention

If you persist conversations beyond the channel's retention:

Persist completed turns to your own store on run.end(). Hydrate sessions from the store on demand; the channel handles live activity.
Channel history retention is configured per namespace. Pick a value that matches your live-recovery window, not your total retention target. The external store covers anything longer.
For analytics or search, stream channel activity to your data warehouse through webhooks or Kafka. This is one-way; AI Transport reads from the channel, not the warehouse.

Compliance

Ably is SOC 2 Type II certified and HIPAA compliant. See security and compliance for more information.

For the application-side:

Message content, including tool inputs and outputs are conversation content. They are visible to every subscriber by capability.
For region-pinned data residency, scope capabilities to namespaces hosted in the right region. See Ably's edge network.

Deployment notes

A few patterns that come up:

Serverless agents are fine. Each turn creates a server transport, processes the turn, and tears down. The session is on the channel, not in the agent process.
after() (Next.js) or equivalent post-response continuation is what lets the HTTP route return before the model finishes streaming. Without it, the streaming budget is bound to the request timeout.
Multiple agent regions are supported. Ably's infrastructure keeps state present in every region; an agent in one region publishes and clients in any region see the same state with low latency from their nearest edge.

Durable execution

If an agent turn spans multiple stages (inference, tools, follow-up inference) or exceeds a serverless function's runtime budget, run the agent inside a workflow engine such as Temporal or Vercel WDK. Each stage becomes its own retryable activity. AI Transport's AgentSession.adoptRun plus AgentRun.createStep({ stepId }) let a retry of the same stage supersede the failed attempt's channel output rather than append beside it. See Durable execution.

When a turn exhausts its retries, your failure path needs to adopt the Run and publish run.end({ reason: 'error' }). Otherwise the Run stays open on the channel and every observer's UI stays on streaming.

When adopting AI Transport in production, also consider

Each feature page has its own edge cases. The cross-cutting items to walk through:

Token streaming: rollup tuning, partial responses on cancel.
Cancellation: cancel authorisation, abort signal handling in tools.
Reconnection and recovery: the live recovery window, history capability scoping.
Multi-device sessions: clientId uniqueness, capability scoping per user.
Tool calling: tool timeouts honouring abort signals, large tool outputs.
Human-in-the-loop: pending approvals that never resolve.
Durable execution: stepId sourcing, adopt-run status gates, workflow-level cleanup.