5 min readUpdated Apr 8, 2026

Ably Python SDK v3: realtime for Python, built for AI

Python dominates AI development. It's where teams build their agents, orchestration layers, and the backend systems that turn LLM calls into products people actually use. Over the past year, those systems have matured rapidly. What used to live in notebooks and prototypes is now running in production, serving real users with real expectations around reliability and performance.

That maturity brings infrastructure requirements. Tokens need to stream in order. Sessions need to survive refreshes, reconnects, and device switches. Delivery needs to be reliable even when networks aren't. These aren't features you bolt on later; they're foundational to the user experience.

We've been watching this shift closely, and v3 is our response: a complete rebuild of the Python SDK, designed for production AI infrastructure.

What's new in v3

Starting with the big one.

AI Transport support

Most AI apps start with a simple streaming setup: open a connection, stream tokens, render as they arrive. It works in demos. In production, it gets fragile fast.

Users refresh mid-response. Networks drop. People switch devices. Tabs get backgrounded. Each of these breaks the stream, and teams end up rebuilding the same recovery logic: buffering, replay, reconnection handling, session state management. It's necessary work, but it's not differentiation.

AI Transport handles that layer so you don't have to. And with v3, it's now fully supported in Python:

  • Token streaming with ordering guarantees. Without delivery order guarantees, tokens arrive out of sequence and clients have no reliable way to reconstruct the response. v3 streams LLM output token-by-token with ordering preserved, so what the user sees matches what the model generated.
  • Resumable sessions. When a connection drops mid-stream over a standard HTTP connection, the response is gone. The user re-prompts, you pay for the same tokens twice, and they learn not to trust the product for anything that takes more than a few seconds. Resumable sessions persist output in the channel so a reconnecting client catches up from its last received position, whether the cause was a refresh, a network switch, or a crash.
  • Multi-device continuity. When a conversation is tied to a single connection, switching devices means starting over. Channel-based sessions let any device subscribe and catch up from where the conversation left off, because the state lives in the channel, not the client.
  • Live steering. Long responses often go in the wrong direction. Without bi-directional messaging on the same channel, users are stuck in rigid turn-taking: waiting for a wrong answer to finish before they can correct it. Live steering lets users interrupt or redirect mid-response without breaking conversation state or forcing a stop-and-reprompt loop.

If you're building AI applications with a Python backend, this is probably what you came here for. Read the AI Transport docs to go deeper.

Beyond AI Transport

Three capabilities that underpin AI Transport and are useful in their own right.

Realtime publishing. Publish messages over a persistent WebSocket connection instead of individual REST calls. This matters when you're streaming at high volume. With REST, per-request overhead becomes a bottleneck at token-by-token rates, and ordering across concurrent requests gets complicated fast. A persistent connection removes both problems.

Presence. Track who's connected to a channel. Enter, leave, update presence data. For AI applications, this matters for cost as much as UX: without presence, an agent has no reliable way to know whether anyone is actually receiving its output. It keeps streaming to an empty room, burning compute and model budget on responses nobody sees. Presence gives your backend a live signal to pause or downgrade agent activity when users disconnect.

Mutable messages. Edit, delete, and append to messages after publishing. For AI applications, this means streaming tokens into a single message that grows over time, rather than publishing thousands of individual token messages. Without it, subscribers receive a flood of single-token events and history becomes nearly impossible to hydrate efficiently on reconnect.

Example: token streaming with message-per-response

from ably import AblyRealtime, Message

from ably import AblyRealtime, Message

async def stream_response():
    realtime = AblyRealtime('your-api-key')
    channel = realtime.channels.get('ai:session-123')
    
    # Publish initial message and capture the serial
    message = Message(name='response', data='')
    result = await channel.publish(message)
    msg_serial = result.serials[0]
    
    # Stream tokens from your LLM - don't await for throughput
    async for token in your_llm_stream():
        channel.append_message(serial=msg_serial, data=token)

Subscribers receive tokens as they arrive. History contains the complete concatenated response. Clients that reconnect can hydrate from history without replaying every individual token.

What you can build with v3

AI agents in Python. Your LangChain, LlamaIndex, or custom agent framework can now stream output directly through Ably with delivery guarantees. No REST batching, no hand-rolled reconnection logic.

Backend-to-backend realtime. Python services can subscribe to channels and receive updates in realtime. Useful for B2B scenarios where external clients need reliable communication with your servers, for example, an AI platform streaming responses to customers integrating your API.

Desktop and CLI applications. Build Python applications with live updating interfaces. The SDK handles connection management, reconnection, and message recovery.

Getting started

Install or upgrade via pip:

pip install ably

Or check out the Python SDK docs to get everything you need.

Looking ahead

This release brings Python to feature parity with our other realtime SDKs. It's also the first version designed from the ground up for production AI infrastructure.

LiveObjects support is on the roadmap, which will bring shared state sync to Python - useful for the same session coordination patterns described above. We're also working on tighter integrations with specific agent frameworks; more on that when there's something concrete to show.

If you're building AI infrastructure in Python, we'd like to hear how it goes. Feedback on real-world usage shapes where the SDK goes next.



Join the Ably newsletter today

1000s of industry pioneers trust Ably for monthly insights on the realtime data economy.
Enter your email