TL;DR Calling stop() in Vercel AI SDK closes the client-side connection but the server typically keeps generating. HTTP abort signals are binary - alive or dead - and can't carry intent. In distributed deployments the abort request may arrive at a different server instance than the one streaming. The server has no reliable way to distinguish an intentional stop from a network drop. The fix is a bidirectional channel where cancel is an explicit published message, not a connection close.
You call stop(), the spinner disappears, and your UI looks like it worked. Meanwhile, on the server, tokens are still flowing - and you're being billed for all of them.
This is a known limitation of HTTP-based streaming in distributed deployments. It gets worse the more distributed your infrastructure is.
What stop() actually does
stop() in useChat closes the client-side HTTP connection. That's it. There is no signal path from the client to the server on the streaming connection itself - SSE is one-way by design.
What the SDK does instead is send an abort via AbortController. The abort travels to the server as a separate HTTP request. On a single local server this works most of the time. In production it often doesn't.
The maintainer explained why in GitHub issue #6502:
"The hard part: keep a reference on the server to the abort signal... in particular in a distributed scenario where the abort request could arrive in a different machine or even data center."
This is the root problem. The abort signal and the streaming response travel independently. They have no guarantee of reaching the same process.
What developers actually see
The failure modes follow a consistent pattern across hundreds of GitHub issues:
The server ignores the abort. stop() returns immediately on the client. The server continues generating to completion. The user sees a stopped UI; the backend runs to the end of the response and charges accordingly.
The abort arrives too late. With streaming, the abort signal may arrive at the server after generation has already advanced past the point where stopping would be meaningful. Token delivery continues; the client discards the output it doesn't want.
The abort arrives at the wrong instance. Load-balanced deployments route requests across multiple server instances. The HTTP abort - sent as a separate request - may land on a different instance than the one holding the stream. That instance has no stream to abort.
The abort is misread as a disconnect. When a client closes a connection without sending an explicit stop, the server sees a TCP close. There is no way for the server to distinguish intentional stop from network failure, tab close, or mobile backgrounding. One team reported getting billed for approximately 13,000 tokens after a client abort that the server never received as intentional (#8325).
The binary problem with HTTP abort
HTTP streaming is unidirectional. The only signal a client can send on a live SSE connection is to close it. That close - the TCP FIN - is binary. It carries no information about why the connection closed.
The Vercel AI SDK's AbortController approach works around this by sending a second HTTP request. But that request competes with the stream for routing, timing, and infrastructure - and it can lose.
There is no workaround that makes this reliable at the HTTP level. The SDK maintainer noted in issue #6502 that solving this requires "a channel to the server" — something SSE's one-way architecture cannot provide.
This is an architectural constraint of one-way HTTP, not a bug in the SDK.
The cost
The practical consequence is unbilled compute. One team reported 13,000 tokens billed after a client abort the server never received as intentional. For long-running agents with expensive models, the exposure is significant. The failure is also silent: the server has no reason to log that it ignored a stop it never received. You typically don't know this is happening until you audit your API bills.
The secondary cost is UX. Your interface shows a stopped state. The server is still running. If the user starts a new request, they now have two concurrent generations competing for resources on the same session.
The fix: explicit signals over a bidirectional channel
The reliable fix is to stop relying on connection close as the stop signal. Instead, publish an explicit cancel message on a channel the server is actively listening to.
With a bidirectional transport, stop is a message: { type: "cancel", sessionId: "..." }. The server receives it on the same channel as the session. It knows the difference between cancel and disconnect because they arrive as different message types - not as the presence or absence of a TCP connection.
This works reliably across distributed infrastructure because the cancel travels on the same channel as the session, not as a separate HTTP request that may be routed elsewhere. The same channel can also carry steer and redirect instructions mid-stream - changing direction without stopping, which is currently impossible over SSE.
Vercel's ChatTransport interface is the integration point. Swapping in a bidirectional transport gives useChat an explicit signal path without any changes to application code:
const { messages, stop } = useChat({
transport: new MyBidirectionalTransport()
});stop() now publishes a cancel message. The server receives it as intent, not inference.
What to look for in a transport
Not every WebSocket transport solves the stop problem reliably. The relevant capabilities:
Typed control messages. The transport should support sending structured messages - cancel, steer, redirect - as first-class channel operations, not raw TCP closes.
Session identity. The cancel message needs to be tied to the correct session, reliably, across distributed infrastructure. A transport that manages session IDs at the channel level handles this without custom code.
Delivery guarantees. A cancel that gets dropped is no better than the current abort that gets misrouted. The transport should confirm delivery.
Ably AI Transport implements the ChatTransport interface and provides bidirectional signaling with delivery guarantees. Cancel is a channel message, not a connection close. Visit the Ably AI Transport overview, read the documentation, or sign up free.
Ready to build? Get started with Vercel AI SDK.
Research basis: analysis of 300+ GitHub issues in vercel/ai repository; 31 Vercel Community Forum threads (65% unresolved). GitHub issues cited: #6502, #309 ("streamText: add callback for handling aborted streams"), #1122 ("Unable to cancel or abort streaming UI server actions"), #10719, #8325 ("AI Gateway keeps fetching full stream even if client aborts - billed ~13K tokens"). Maintainer quote from Lars Grammel, Vercel (#6502). Billing impact confirmed by Nico Albanese, Vercel (#8325).
Recommended Articles
Vercel AI SDK resumable-stream: what it covers and what it doesn't
Vercel's resumable-stream covers page reloads only. Tab switches, mobile backgrounding, and device switches lose the stream. Also incompatible with stop().
WebSockets on Vercel: why serverless functions can't host them
Vercel serverless functions can't host WebSocket connections, even with Fluid Compute. Options and how to connect a WebSocket provider to Vercel AI SDK.
Vercel AI SDK ChatTransport: implementing a custom WebSocket transport
ChatTransport in Vercel AI SDK 5 lets you replace the default HTTP transport with WebSockets. Application code, agents, and UI stay unchanged.