Chain of thought

Your users see the agent's reasoning as it streams, side by side with the response. AI Transport multiplexes reasoning and text streams within the same turn.

Chain of thought streams reasoning content alongside the main response text. The codec supports multiple stream types within a single turn. Text and reasoning are delivered as separate streams that render independently in the UI.

How it works

When an LLM produces reasoning or thinking tokens, the codec multiplexes them alongside text tokens on the same Ably channel. Each stream type is tagged so the client routes reasoning content to one part of the UI and response text to another.

With the Vercel AI SDK integration, reasoning arrives as a separate reasoning stream type within the UI message stream:

JavaScript

1

2

3

4

5

6

7

8

9

10

11

12

13

14

app.post('/api/chat', async (req, res) => {
  const { turnId, clientId, messages } = req.body;
  const turn = transport.newTurn({ turnId, clientId });

  const result = streamText({
    model: anthropic('claude-sonnet-4-20250514'),
    messages,
    abortSignal: turn.abortSignal,
  });

  await turn.streamResponse(result.toUIMessageStream());
  await turn.end('complete');
  res.json({ ok: true });
});

No additional server configuration is needed. If the model produces reasoning tokens, the codec encodes them as a distinct stream within the turn.

Display reasoning in the UI

On the client, message nodes contain both text and reasoning content. Render them separately to show the agent's thinking process:

JavaScript

1

2

3

4

5

6

7

8

9

10

11

const { nodes } = useView({ transport });

for (const node of nodes) {
  for (const part of node.message.parts) {
    if (part.type === 'reasoning') {
      renderThinkingPanel(part.reasoning);
    } else if (part.type === 'text') {
      renderResponsePanel(part.text);
    }
  }
}

Both streams update in real time. Users see the reasoning appear as the model thinks, followed by (or alongside) the response text.

Edge cases and unhappy paths

  • A model that produces reasoning but the codec does not surface it folds reasoning into the text stream. Update the codec or the framework integration if you need it separated.
  • Reasoning tokens are often longer than the final response. They count toward the channel's message rate and storage like any other tokens. See token streaming for rollup tuning.
  • A cancelled turn aborts both streams. Partial reasoning content stays with status aborted.
  • Two reasoning streams in the same turn are exposed in the same order they were emitted by the model. Multiple distinct reasoning episodes are common with tool-augmented agents.
  • A client that does not render reasoning parts still receives them on the channel. Filter at the render layer if you want to hide them by default.

FAQ

Which models support chain of thought?

Anthropic's thinking-enabled models and OpenAI's o-series surface reasoning tokens. Other models do not. Check the model provider documentation.

Can I hide reasoning from the user?

Yes. Reasoning is a separate part type. Filter it out at the render layer. The content is still on the channel for any client that wants it.

Does cancelling cut off reasoning too?

Yes. Both the reasoning and text streams share the turn's abort signal.

Are reasoning tokens charged the same as text?

Yes. The channel does not distinguish between part types for billing. The cost depends on the published message count after rollup.

How do I render reasoning differently after the turn finishes?

The node.message.parts array stays available after the turn ends. Hide or collapse reasoning when the streaming flag flips to false.

  • Token streaming: how text tokens are streamed and accumulated.
  • Tool calling: another multi-part stream type within a turn.
  • Codec API: reference for the codec that multiplexes reasoning and text streams.