AI Transport

We built a Custom Transport for Vercel's AI SDK

The Vercel AI SDK's default transport is built for HTTP. That works until you need multi-device delivery, resumable streams, or more than one user in a conversation. We built a custom Ably transport for useChat, and hit the state machine assumptions it wasn't designed to handle.

We built a Custom Transport for Vercel's AI SDK

Ably is a realtime messaging platform, it's a pub/sub product where you can publish messages to channels and clients subscribed to those channels will receive those messages in realtime.

It turns out that the Ably realtime platform is really well suited to being the transport that sits between your AI models and the clients receiving the generated responses.

We're trying to meet developers where they currently are, and one of those places is the Vercel AI SDK. So we built a custom transport for the Vercel AI SDK that uses Ably as the transport layer. We want to expose all the features the Ably AI Transport supports to the AI SDK; multi-device, multi-user, resumable streams, human handoff, history compaction, barge-in and interruption, and more.

So this post covers what we managed to support when building against the AI SDK. It was an exercise in trying to make a library do something it wasn't originally designed for.

AI SDK or AI UI SDK?

So the Vercel AI SDK comes in two flavors, the AI SDK to run on the server and the AI UI SDK to run on the client. The UI SDK provides a bunch of react hooks and is where we'd focus most of our efforts.

The main react hook that you need to know about is useChat(...)

const { messages, sendMessage, status } = useChat({
  transport: ablyChatTransport,
});

return (
  <div>
    {messages.map((m) => (
      <div key={m.id}>{m.role}: {m.parts.map(p => p.text).join('')}</div>
    ))}
    <input onKeyDown={(e) => {
      if (e.key === 'Enter') sendMessage({ prompt: e.currentTarget.value });
    }} />
  </div>
);

useChat is the react hook that creates a chatbot interface that you'd expect from an AI assistant. It provides a 'messages' array that contains the messages in the conversation, and a 'sendMessage' function that you can use to send a message to the LLM.

The default transport over SSE

The default transport for the UI SDK is based on HTTP. The client makes an HTTP POST request carrying the user prompt and the conversation history. The client holds the connection open, waiting for an SSE response from the server containing the response tokens.

HTTP is an obvious choice when the SDK was built by a team from Vercel; a serverless app platform based predominately on HTTP.

HTTP streaming SSE is a simple and common design but it falls down when you try and add more advanced features, because:

  • It's not multi-device. If you have the chat open on your phone and your laptop, only one of those devices will receive the response.
  • It's not multi-user. If you have multiple users chatting with the same bot, they won't see each other's messages or responses.
  • It's not really resumable. SSE has lastEventId which technically supports resume, but that only works if your server stores the individual SSE events and can replay them on reconnect. Most don't in practice. And if the user refreshes the page, the connection is gone and there's no way to pick up where you left off.
  • Cancellation sucks. The HTTP SSE stream isn't bidirectional, so cancellation means closing the HTTP connection entirely. Even the SDK's own stop() function is broken. It fires the abort signal but returns immediately without waiting for the stream to terminate, so buffered chunks keep arriving after you've supposedly stopped. There's also an open issue where stop() returns, the UI status stays streaming, and the server keeps generating tokens until completion. No barge-in or interruption support either.
  • There's no history, you need to build that separately.
  • There's no automatic compaction of tokens into full responses.

These are real problems that folks have encountered, the SDK has open issues for losing partial messages on stream errors and failing to resume streams mid-response.

These are all features that are fully supported by the Ably transport, but aren't easily supported in HTTP based SSE responses.

The UI SDK exposes this transport using a ChatTransport interface, with the methods:

  • sendMessages() (send a prompt, return a stream of response chunks)
  • reconnectToStream() (resume after disconnect), 

Implement these and you can swap out the default HTTP transport for anything.

useChat assumes one request and one response

The biggest issue we had when building the custom transport was that useChat was designed around a single-request single-response flow. It assumes that for every message you send, you get one response back. This is a problem because the Ably transport is designed to support multiple responses for a single message and multiple users participating in a single conversation.

useChat's state machine expects a series of chunks in response to a single user prompt.

User sends: "What is pub/sub?"

useChat reads these chunks from the stream returned by sendMessages():

  { type: 'step-start' }
  { type: 'text-start',  id: 'text-1' }
  { type: 'text-delta',  id: 'text-1', delta: 'Pub/sub is ' }
  { type: 'text-delta',  id: 'text-1', delta: 'a messaging pattern ' }
  { type: 'text-delta',  id: 'text-1', delta: 'where publishers send...' }
  { type: 'text-end',    id: 'text-1' }
  { type: 'step-finish', finishReason: 'stop' }
  { type: 'finish' }

status:  ready → submitted → streaming → ready

Each chunk is either a control message like start or finish, or a content message like text-delta containing tokens from the LLM response.

The single request single response assumption in useChat is obviously an issue if you want to support multiple users in the same conversation, because only one of those users has sent the prompt, but the prompt and response should be fanned out to all the users in the conversation.

Internally, useChat tracks one activeResponse at a time. If two messages are sent concurrently, the second overwrites the first, the onFinish lifecycle hook fires once instead of twice, and you can end up crashing on undefined state. The community has asked for multi-message streaming but there's no support for it yet.

useChat's setMessages(...) backdoor

Sharing the conversation state between multiple users is easy over Ably channels, but updating that state in useChat is hard because of the single request single response design.

But useChat has a secret weapon, a setMessages(...) function that you can use to set the messages state directly. This is a backdoor that allows you to bypass the state machine and set the conversation state to whatever you want.

This is what we ended up doing, we used setMessages(...) to set the message state directly, with the full conversation no matter which user sent the prompt. This allowed us to support multi-user conversations.

The problem with this approach is that setMessages(...) completely bypasses the state machine, which immediately breaks a lot of the built-in features of useChat like lifecycle hooks and tool-call notifications.

Building around the limitations

Sometimes you just have to do the best with what you've got, and what we've got is a square peg and a round hole. useChat was never designed to support the kinds of features we're trying to add to it. So we built around the limitations by tracking 'own-turns' (i.e. prompts submitted by this client, and the LLM response to that prompt) and 'observer-turns' (i.e. prompts submitted by other clients, and the response).

Own-turns can trigger the full lifecycle, they go through the regular sendMessages(...) flow in useChat and process lifecycle hooks and tool-calls as normal.

Observer-turns are set directly with setMessages(...) and bypass the lifecycle hooks and tool-call notifications, but at least they show up in the conversation for all users. We also have to temporarily buffer observer-turns if there's currently an own-turn in progress, because the state machine doesn't support interleaving messages from multiple responses.

So what extra can you do with useChat and the Ably transport?

Actually quite a lot, the Ably transport can add these features to useChat:

  • Multi-device with automatic fan out
  • Multi-user conversations, with more than one user submitting prompts and receiving responses in the same conversation.
  • Resumable streams, if you lose your connection you can reconnect and receive the rest of the response automatically.
  • Human handoff, you can have a human take over the conversation at any time and respond to the user prompts because we already have multi-user support.
  • Interruptions, cancellation, barge-in, you can interrupt or steer the LLM conversation at any time by sending a new prompt, even if the previous response hasn't finished yet. This is possible because the Ably transport uses channels, and channels are a bi-directional streaming layer.
  • History compaction, the tokens from LLM responses are automatically compacted together into a single message in the conversation history, so live clients receive tokens progressively in realtime, but new clients joining the conversation later receive the full response in one message.

There's a whole bunch more to the Ably AI Transport than we have talked about here. But even with the limitations of useChat, just changing from an HTTP transport to the Ably transport you can unlock a whole bunch of extra features like multi-device, multi-user, resumable streams, human handoff, interruptions, and history compaction.

If you don't want to be constrained by useChat at all, the Ably AI Transport SDK also provides useClientTransport and useView — react hooks that give you direct access to the transport and the conversation tree without going through useChat's state machine. You still get the Vercel AI SDK's stream format, but you're not fighting the single-response assumptions. Check out our AI Transport SDK if you're interested.