Guide: Stream Anthropic responses using the message-per-token pattern

Open in

This guide shows you how to stream AI responses from Anthropic's Messages API over Ably using the message-per-token pattern. Specifically, it implements the explicit start/stop events approach, which publishes each response token as an individual message, along with explicit lifecycle events to signal when responses begin and end.

Using Ably to distribute tokens from the Anthropic SDK enables you to broadcast AI responses to thousands of concurrent subscribers with reliable message delivery and ordering guarantees, ensuring that each client receives the complete response stream with all tokens delivered in order. This approach decouples your AI inference from client connections, enabling you to scale agents independently and handle reconnections gracefully.

Prerequisites

Node.js 20 or higher is required.

You also need:

An Anthropic API key
An Ably API key

Useful links:

Agent setup

Create a new Node project for the agent code:

mkdir ably-anthropic-agent && cd ably-anthropic-agent
npm init -y
npm install @anthropic-ai/sdk ably

Export your Anthropic API key to the environment:

export ANTHROPIC_API_KEY="your_api_key_here"

Client setup

Create a new Node project for the client code, or use the same project as the agent if both are JavaScript:

mkdir ably-anthropic-client && cd ably-anthropic-client
npm init -y
npm install ably

Step 1: Get a streamed response from Anthropic

Initialize an Anthropic client and use the Messages API to stream model output as a series of events.

In your ably-anthropic-agent directory, create a new file called agent.mjs with the following contents:

Agent

JavaScript

import Anthropic from '@anthropic-ai/sdk';

// Initialize Anthropic client
const anthropic = new Anthropic();

// Process each streaming event
function processEvent(event) {
  console.log(JSON.stringify(event));
  // This function is updated in the next sections
}

// Create streaming response from Anthropic
async function streamAnthropicResponse(prompt) {
  const stream = await anthropic.messages.create({
    model: "claude-sonnet-4-5",
    max_tokens: 1024,
    messages: [{ role: "user", content: prompt }],
    stream: true,
  });

  // Iterate through streaming events
  for await (const event of stream) {
    processEvent(event);
  }
}

// Usage example
streamAnthropicResponse("Tell me a short joke");

Understand Anthropic streaming events

Anthropic's Messages API streams model output as a series of events when you set stream: true. Each streamed event includes a type property which describes the event type. A complete text response can be constructed from the following event types:

message_start: Signals the start of a response. Contains a message object with an id to correlate subsequent events.
content_block_start: Indicates the start of a new content block. For text responses, the content_block will have type: "text"; other types may be specified, such as "thinking" for internal reasoning tokens. The index indicates the position of this item in the message's content array.
content_block_delta: Contains a single text delta in the delta.text field. If delta.type === "text_delta" the delta contains model response text; other types may be specified, such as "thinking_delta" for internal reasoning tokens. Use the index to correlate deltas relating to a specific content block.
content_block_stop: Signals completion of a content block. Contains the index that identifies the content block.
message_delta: Contains additional message-level metadata that may be streamed incrementally. Includes a delta.stop_reason which indicates why the model successfully completed its response generation.
message_stop: Signals the end of the response.

The following example shows the event sequence received when streaming a response:

JSON

// 1. Message starts
{"type":"message_start","message":{"model":"claude-sonnet-4-5-20250929","id":"msg_016hhjrqVK4rCZ2uEGdyWfmt","type":"message","role":"assistant","content":[],"stop_reason":null,"stop_sequence":null,"usage":{"input_tokens":12,"cache_creation_input_tokens":0,"cache_read_input_tokens":0,"cache_creation":{"ephemeral_5m_input_tokens":0,"ephemeral_1h_input_tokens":0},"output_tokens":1,"service_tier":"standard"}}}

// 2. Content block starts
{"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

// 3. Text tokens stream in as delta events
{"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Why"}}
{"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" don't scientists trust atoms?\n\nBecause"}}
{"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" they make up everything!"}}

// 4. Content block completes
{"type":"content_block_stop","index":0}

// 5. Message delta (usage stats)
{"type":"message_delta","delta":{"stop_reason":"end_turn","stop_sequence":null},"usage":{"input_tokens":12,"cache_creation_input_tokens":0,"cache_read_input_tokens":0,"output_tokens":17}}

// 6. Message completes
{"type":"message_stop"}

Step 2: Publish streaming events to Ably

Publish Anthropic streaming events to Ably to reliably and scalably distribute them to subscribers.

This implementation follows the explicit start/stop events pattern, which provides clear response boundaries.

Initialize the Ably client

Add the Ably client initialization to your agent file:

Agent

JavaScript

import Ably from 'ably';

// Initialize Ably Realtime client
const realtime = new Ably.Realtime({
  key: 'demokey:*****',
  echoMessages: false
});

// Create a channel for publishing streamed AI responses
const channel = realtime.channels.get('who-tax-beg');

API key:

DEMO ONLY

The Ably Realtime client maintains a persistent connection to the Ably service, which allows you to publish tokens at high message rates with low latency.

Map Anthropic streaming events to Ably messages

Choose how to map Anthropic streaming events to Ably messages. You can choose any mapping strategy that suits your application's needs. This guide uses the following pattern as an example:

start: Signals the beginning of a response
token: Contains the incremental text content for each delta
stop: Signals the completion of a response

Update your agent file to initialize the Ably client and update the processEvent() function to publish events to Ably:

Agent

JavaScript

// Track state across events
let responseId = null;

// Process each streaming event and publish to Ably
function processEvent(event) {
  switch (event.type) {
    case 'message_start':
      // Capture message ID when response starts
      responseId = event.message.id;

      // Publish start event
      channel.publish({
        name: 'start',
        extras: {
          headers: { responseId }
        }
      });
      break;

    case 'content_block_delta':
      // Publish tokens from text deltas only
      if (event.delta.type === 'text_delta') {
        channel.publish({
          name: 'token',
          data: event.delta.text,
          extras: {
            headers: { responseId }
          }
        });
      }
      break;

    case 'message_stop':
      // Publish stop event when response completes
      channel.publish({
        name: 'stop',
        extras: {
          headers: { responseId }
        }
      });
      break;
  }
}

This implementation:

Publishes a start event when the response begins
Filters for content_block_delta events with text_delta type and publishes them as token events
Publishes a stop event when the response completes
All published events include the responseId in message extras to allow the client to correlate events relating to a particular response

Run the publisher to see tokens streaming to Ably:

cd ably-anthropic-agent
node agent.mjs

Create a subscriber that receives the streaming events from Ably and reconstructs the response.

In your ably-anthropic-client project directory, create a new file called client.mjs with the following contents:

Client

JavaScript

import Ably from 'ably';

// Initialize Ably Realtime client
const realtime = new Ably.Realtime({ key: 'demokey:*****' });

// Get the same channel used by the publisher
const channel = realtime.channels.get('who-tax-beg');

// Track responses by ID
const responses = new Map();

// Handle response start
await channel.subscribe('start', (message) => {
  const responseId = message.extras?.headers?.responseId;
  console.log('\n[Response started]', responseId);
  responses.set(responseId, '');
});

// Handle tokens
await channel.subscribe('token', (message) => {
  const responseId = message.extras?.headers?.responseId;
  const token = message.data;

  // Append token to response
  const currentText = responses.get(responseId) || '';
  responses.set(responseId, currentText + token);

  // Display token as it arrives
  process.stdout.write(token);
});

// Handle response stop
await channel.subscribe('stop', (message) => {
  const responseId = message.extras?.headers?.responseId;
  const finalText = responses.get(responseId);
  console.log('\n[Response completed]', responseId);
});

console.log('Subscriber ready, waiting for tokens...');

API key:

DEMO ONLY

Run the subscriber in a separate terminal:

cd ably-anthropic-client
node client.mjs

With the subscriber running, run the publisher in another terminal. The tokens stream in realtime as the Anthropic model generates them.

Step 4: Stream with multiple publishers and subscribers

Ably's channel-oriented sessions enables multiple AI agents to publish responses and multiple users to receive them on a single channel simultaneously. Ably handles message delivery to all participants, eliminating the need to implement routing logic or manage state synchronization across connections.

Broadcasting to multiple subscribers

Each subscriber receives the complete stream of tokens independently, enabling you to build collaborative experiences or multi-device applications.

Run a subscriber in multiple separate terminals:

# Terminal 1
cd ably-anthropic-client && node client.mjs

# Terminal 2
cd ably-anthropic-client && node client.mjs

# Terminal 3
cd ably-anthropic-client && node client.mjs

All subscribers receive the same stream of tokens in realtime.

Publishing concurrent responses

The implementation uses responseId in message extras to correlate tokens with their originating response. This enables multiple publishers to stream different responses concurrently on the same channel, with each subscriber correctly tracking all responses independently.

To demonstrate this, run a publisher in multiple separate terminals:

# Terminal 1
cd ably-anthropic-agent && node agent.mjs

# Terminal 2
cd ably-anthropic-agent && node agent.mjs

# Terminal 3
cd ably-anthropic-agent && node agent.mjs

All running subscribers receive tokens from all responses concurrently. Each subscriber correctly reconstructs each response separately using the responseId to correlate tokens.

Next steps

Learn more about the message-per-token pattern used in this guide
Learn about client hydration strategies for handling late joiners and reconnections
Understand sessions and identity in AI enabled applications
Explore the message-per-response pattern for storing complete AI responses as single messages in history

Guide: Stream Anthropic responses using the message-per-token pattern

Prerequisites

Agent setup

Client setup

Step 1: Get a streamed response from Anthropic

Understand Anthropic streaming events

Step 2: Publish streaming events to Ably

Initialize the Ably client

Map Anthropic streaming events to Ably messages

Step 3: Subscribe to streaming tokens

Step 4: Stream with multiple publishers and subscribers

Broadcasting to multiple subscribers

Publishing concurrent responses

Next steps