Going to production
Before you ship AI Transport in production, walk through this checklist. It covers the operational concerns that come up across every feature.
This page is the production checklist for AI Transport. Work through each item before traffic moves to production. Where a topic has detail beyond what fits inline, the relevant link is in the item.
Each feature page also has its own edge cases and unhappy paths section; this page covers the cross-cutting concerns.
Pricing
AI Transport runs on Ably's usage-based billing. Costs scale with published and delivered messages, channel time, and connection time. The exact rates depend on your package; contact Ably for enterprise pricing.
Two AI Transport patterns affect cost predictably:
- Token streaming uses Ably's append rollup to compact many token events into fewer published messages. The default rollup window is 40 ms (about 25 messages per second per stream). See token streaming for tuning.
- Channel history retention determines how long the session is hydratable from Ably alone. Longer retention costs more storage; pair with an external store for long-lived conversations.
For a worked example, see the AI chatbot pricing example.
Limits and quotas
The platform limits page covers the hard limits in detail. The ones that matter most for AI Transport:
- Connection inbound message rate. A single stream that exceeds the per-connection rate triggers rate-limiting. Use the append rollup window to control the publish rate. See token streaming.
- Channel message rate. Concurrent turns and multi-agent setups share the channel's rate budget.
- Message size. Large tool outputs or attachments approach the per-message size limit. Stream large results or persist them externally and reference the URL.
- Channel history retention. Configure retention through the channel namespace settings. Plan for the longest session you expect to hydrate from Ably alone.
Monitoring and observability
Watch four signals:
- Channel publish rate per session, against the connection's inbound rate limit.
- Turn lifecycle outcomes (
completed,cancelled,failed). A rising failure rate is your first signal that something is wrong with model orchestration, auth, or capacity. - Server-side LLM latency and error rate, from your model provider. The Ably side is decoupled from the LLM; track them separately.
Wire these into your existing dashboards through Ably's integrations. Webhooks, Kafka, and other sinks expose channel activity. Ably also offers Stats API for usage data.
Auth hardening
The defaults are safe for development. For production:
- Scope channel capabilities to the specific conversation channel (or the namespace) the user needs access to.
publish,subscribe, andhistoryonly on channels the user owns. - Set
clientIdin the token, not on the client. The Ably service verifies tokens; client-set IDs are unverified and spoofable. - Use short token lifetimes and
authUrl(orauthCallback) so the SDK refreshes tokens automatically. One-hour tokens are a reasonable default. - Implement token revocation for forced sign-out.
- For cancel authorisation, set the
onCancelhook on every turn. Reject cancels from clients that do not own the turn unless you specifically want admin-style global cancels.
See authentication for the full setup.
Data shipping and retention
If you persist conversations beyond the channel's retention:
- Persist completed turns to your own store on
turn.end(). Hydrate sessions from the store on demand; the channel handles live activity. - Channel history retention is configured per namespace. Pick a value that matches your live-recovery window, not your total retention target. The external store covers anything longer.
- For analytics or search, stream channel activity to your data warehouse through webhooks or Kafka. This is one-way; AI Transport reads from the channel, not the warehouse.
Compliance
Ably is SOC 2 Type II certified and HIPAA compliant. See security and compliance for the current certifications and how to enable HIPAA-eligible workloads.
For application-side compliance:
- Log message metadata (turn IDs, client IDs, timestamps), not message contents, by default. Surface content logs through an opt-in path with the right retention.
- Treat tool inputs and outputs as conversation content. They are visible to every subscriber by capability.
- For region-pinned data residency, scope capabilities to namespaces hosted in the right region. See Ably's data residency.
Deployment notes
A few patterns that come up:
- Serverless agents are fine. Each turn creates a server transport, processes the turn, and tears down. The session is on the channel, not in the agent process.
after()(Next.js) or equivalent post-response continuation is what lets the HTTP route return before the model finishes streaming. Without it, the streaming budget is bound to the request timeout.- Multiple agent regions are supported. Ably's infrastructure keeps state present in every region; an agent in one region publishes and clients in any region see the same state with low latency from their nearest edge.
When adopting AI Transport in production, also consider
Each feature page has its own edge cases. The cross-cutting items to walk through:
- Token streaming: rollup tuning, partial responses on cancel.
- Cancellation: cancel authorisation, abort signal handling in tools.
- Reconnection and recovery: the live recovery window, history capability scoping.
- Multi-device sessions:
clientIduniqueness, capability scoping per user. - Tool calling: tool timeouts honouring abort signals, large tool outputs.
- Human-in-the-loop: pending approvals that never resolve.
Read next
- Authentication: the full auth setup.
- Infrastructure: the platform guarantees you are relying on.