1. Topics
  2. /
  3. Protocols
  4. /
  5. The challenge of scaling WebSockets [with video]
18 min readUpdated Sep 26, 2024

The challenge of scaling WebSockets [with video]

Jo Stichbury
Written by:
Jo Stichbury

When it comes to scaling WebSockets, there’s no such thing as a one-size-fits-all solution. Supporting a high volume of connections while maintaining the lowest possible latency is the name of the game, but your specific scaling strategy likely depends on your project’s requirements.

On this page, we’ll outline the considerations you need to make, and present a summary of the options for scaling WebSockets. 

Based on our experience of scaling Ably to support an almost-infinite number of connections, we explore not only the theory behind scaling WebSockets, but address scaling issues that creep up in the real world as well. 

We’ve summarized the key tips in this video. Feel free to read on below for more info.

Copy link to clipboard

What are WebSockets?

In a nutshell, WebSocket is a realtime web technology that enables bidirectional, full-duplex communication between client and server over a persistent connection. The WebSocket connection is kept alive for as long as needed (in theory, it can last forever), allowing the server and the client to send data at will, with minimal overhead.

Learn more:

Copy link to clipboard

Are WebSockets scalable?

Yes, WebSockets are scalable. Companies like Slack, Netflix, and Uber use WebSockets to power realtime features in their apps for millions of end-users. For example, Slack uses WebSockets for instant messaging between chat users

However, scaling WebSockets is non-trivial, and involves numerous engineering decisions and technical trade-offs. This blog post describes the challenges of scaling WebSockets dependably, so you can be better prepared to tackle them. We’ll go through the following topics:

Copy link to clipboard

Scaling the WebSocket server layer

There are two main paths you can take to scale your server layer: vertical scaling or horizontal scaling.

Horizontal scaling can be challenging but is an effective way to scale your WebSocket application.

Copy link to clipboard

What is vertical scaling? 

Vertical scaling, or scale-up, adds more power (e.g., CPU cores and memory) to an existing server.

Copy link to clipboard

Disadvantages of vertical scaling for WebSockets

At first glance, vertical scaling seems attractive, as it’s pretty straightforward to implement. However, imagine you’ve developed an increasingly popular live user experience such as a live sports data app. It’s a success story, but you’re dealing with an ever-growing number of WebSocket connections as more people sign up and use the app. If you have just one WebSocket server, there’s a finite amount of resources you can add to it, which limits the number of concurrent connections your system can handle. 

Some server technologies, such as NodeJS, cannot take advantage of extra CPU cores. Running multiple server processes can help, but you’ll need an additional mechanism, such as another intervening reverse proxy, to balance traffic between the server processes.

With vertical scaling, you have a single point of failure. What happens if your WebSocket server fails or you need to do some maintenance and take it offline? It’s not a suitable approach for a product system, which is why the alternative, horizontal scaling, is recommended instead.

Copy link to clipboard

What is horizontal scaling?

Horizontal scaling, or scale-out, involves adding more servers to share the processing workload. If you compare horizontal and vertical scaling, horizontal scaling is a more available model in the long run. Even if a server crashes or needs to be upgraded, you are in a much better position to protect your overall availability because the workload is distributed to the other nodes in the network. 

TIP: Horizontal scaling is a superior alternative to vertical scaling because there is no single point of failure or absolute limit to which you can scale. 

Copy link to clipboard

Challenges of horizontal scaling for WebSockets

To ensure horizontal scalability, you’ll need to tackle some engineering complexity.

A major element is that you need to ensure the servers share the compute burden evenly: you need a load balancer layer that can handle traffic at scale. 

Load balancers detect the health of backend resources. If a server goes down, the load balancer redirects its traffic to the remaining operational servers. The load balancer automatically starts distributing traffic whenever you add a new WebSocket server. 

Load balancing distributes incoming network traffic across a group of backend servers. 

TIP: It’s much more complicated to balance load efficiently and evenly across machines with different configurations, so aim for homogeneity.

Copy link to clipboard

Load balancing WebSockets

Load balancing is an essential part of horizontal scalability, since an effective load balancing strategy endows your architecture with the following characteristics:

  • Fault tolerance, high availability, and reliability.

  • Ensures no single server is overworked, which can degrade performance.

  • Minimizes server response time and maximizes throughput.

Load balancing also enables you to add or remove servers as demand dictates.

Copy link to clipboard

Load balancing algorithms

A load balancer will follow an algorithm to determine how to distribute requests across your server farm. There are many different ways to load balance traffic, and the algorithm you select will depend on your needs. 

Round-robin: Each server gets an equal share of the traffic. For a simplified example, let’s assume we have two servers, A and B. The first connection goes to server A; the second goes to server B; the third goes to server A; the fourth goes to B, and so on.

Weighted round-robin: Each server gets a different share of the traffic based on capacity.

Least-connected: Each server gets a share of the traffic based on how many connections it currently has.

Least-loaded: Each server gets a share of the traffic based on how much load it currently has.

Least response time: Traffic is routed to the server that takes the least time to respond to a health monitoring request (the response speed indicates how loaded a server is). Some load balancers might also factor in each server’s number of active connections.

Hashing methods: Routing is decided by a hash of various data from the incoming connection, such as port number, domain name, and IP address.

Random two choices: The load balancer randomly picks two servers and routes a new connection to the machine with the fewest active connections.

Custom load: The load balancer queries the load on individual servers using the Simple Network Management Protocol (SNMP) and assigns a new connection to the server with the best load metrics. You can define various metrics to look at, such as CPU usage, memory, and response time. 

Copy link to clipboard

Load balancing depends on the use case

Your choice of algorithm will depend on your most common usage scenario. Consider, for example, a situation where you have the same number of messages sent to all connected clients, such as live score updates. You can use the round-robin approach if your servers have roughly identical computing capabilities and storage capacity. 

By contrast, if your typical use case involves some connections being more resource-intensive than others, a round-robin strategy might not distribute the load evenly, and you would find it better to use the least bandwidth algorithm.   

TIP: Have a good understanding of your use case, in terms of typical usage patterns and bandwidth, before you choose a load-balancing algorithm.

Copy link to clipboard

Sticky sessions

A sticky session is a load-balancing strategy where each user is “sticky” to a specific server. For example, if a user connects to server A, they will always connect to server A, even if another server has less load.

Sticky sessions can be helpful in some situations but can also be fragile and hinder your approach to scale dynamically. For example, if your WebSocket server becomes overwhelmed and needs to shed connections to balance traffic, or if it fails, a sticky client will keep trying to reconnect to it. It’s hard to rebalance a load when sessions are sticky, and it’s more optimal to use non-sticky sessions accompanied by a mechanism that allows your WebSocket servers to share connection state to ensure stream continuity without needing a connection to the same server. 

Copy link to clipboard

WebSocket fallback strategy

While this article covers WebSockets in particular, there is rarely a one-size-fits-all protocol in large-scale systems. Different protocols serve different purposes better than others. Under some circumstances, you won’t be able to use WebSockets. For example:

  • Some proxies don’t support the WebSocket protocol or terminate persistent connections.

  • Some corporate firewalls, VPNs, and networks block specific ports, such as 443 (the standard web access port that supports secure WebSocket connections).

  • WebSockets are still not entirely supported across all browsers.

Your system needs a fallback strategy, and many WebSocket solutions offer such support. For example, Socket.IO will opaquely try to establish a WebSocket connection if possible and will otherwise fall back to HTTP long polling

In the context of scale, it’s essential to consider the impact that fallbacks may have on the availability of your system. Suppose you have many simultaneous users connected to your system, and an incident causes a significant proportion of the WebSocket connections to fall back to long polling. Your server will experience greater demand as that protocol is more resource-intensive (increased RAM usage). 

To ensure your system’s availability and uptime, your server layer needs to be elastic and have enough capacity to deal with the increased load. 

TIP: Falling back to another protocol changes your scaling parameters because stateful WebSockets fundamentally differ from stateless HTTP. An ideal load balancing strategy for WebSockets might not always apply equally well; you may consider different server farms to handle WebSocket vs. non-WebSocket traffic. 

Copy link to clipboard

Handling unpredictable loads on WebSockets

In addition to horizontal scaling, you should also consider the elasticity of your WebSocket server layer so that it can cope with unpredictable numbers of end-user connections. Design your system in such a way that it’s able to handle an unknown and volatile number of simultaneous users. 

There’s a moderate overhead in establishing a new WebSocket connection — the process involves a non-trivial request/response pair between the client and the server, known as the opening handshake. Imagine tens of thousands or millions of client devices trying to open WebSocket connections simultaneously. Such a scenario leads to a massive burst in traffic, and your system needs to be able to cope. 

You should architect your system based on a pattern designed to scale sufficiently to handle unpredictability. One of the most popular and dependable choices is the publish and subscribe (pub/sub) pattern.

Pub/sub provides a framework for message exchange between publishers (typically your server) and subscribers (often, end-user devices). 

Publishers and subscribers are unaware of each other, as they are decoupled by a message broker, which usually groups messages into channels (or topics). Publishers send messages to channels, while subscribers receive messages by subscribing to relevant channels.

The pub/sub pattern

Decoupling can reduce engineering complexity. There can be limitless subscribers as only the message broker needs to handle scaling connection numbers. As long as the message broker can scale predictably and reliably, your system can deal with the unpredictable number of concurrent users connecting over WebSockets.

Numerous projects are built with WebSockets and pub/sub; many open-source libraries and commercial solutions combine these elements. Examples include Socket.IO with the Redis pub/sub adapter, SocketCluster.io, or Django Channels. 

Copy link to clipboard

Managing WebSocket connections

We’ll now go through a few considerations related to WebSocket connection management: restoring connections, managing heartbeats, and handling backpressure.

Copy link to clipboard

Load shedding WebSocket connections

At a particular scale, you may have to deal with traffic congestion since if the situation is left unchecked, it can lead to cascading failures and even a total collapse of your system. 

You need a load shedding strategy to detect congestion and fail gracefully when a server approaches overload by rejecting some or all of the incoming traffic. 

Here are a few things to have in mind when shedding connections:

  • You should run tests to discover the maximum load that your system is generally able to handle. Anything beyond this threshold should be a candidate for shedding.

  • You must consider a backoff mechanism to prevent rejected clients from attempting to reconnect immediately; this would just put your system under more pressure. 

  • You might also consider dropping existing connections to reduce the load on your system; for example, the idle ones (which, even though idle, are still consuming resources due to heartbeats). 

TIP: You need to have a load shedding strategy; failing gracefully is always better than a total collapse of your system.  

Copy link to clipboard

Restoring connections 

Connections inevitably drop, for example, as users lose connectivity or if one of your servers crashes or sheds connections. When scenarios like these occur, WebSocket connections need to be restored.

Reconnection strategies

You could implement a script to reconnect clients automatically. However, suppose reconnection attempts occur immediately after the connection closes. If the server does not have enough capacity, it can put your system under more pressure and lead to cascading failures. 

An improvement would be to exponentially increase the delay after each reconnection attempt, increasing the waiting time between retries to a maximum backoff time. Compared to a simple reconnection script, this is better because it gives you some time to add more capacity to your system so that it can deal with all the WebSocket reconnections. 

You can improve exponential backoff by making it random, so not all clients reconnect simultaneously.

TIP: Use a random exponential backoff mechanism when handling reconnections to protect your server layer from being overwhelmed, prevent cascading failures, and allow time to add more capacity.

Reconnections with continuity

Data integrity (guaranteed ordering and exactly-once delivery) is crucial for some use cases. Once a WebSocket connection is re-established, the data stream must resume precisely where it left off. Think, for example, of features like live chat, where missing messages due to a disconnection or receiving them out of order leads to a poor user experience and causes confusion and frustration. 

If resuming a stream exactly where it left off after brief disconnections is essential to your use case, you’ll need to consider how to cache messages and whether to transfer data to persistent storage. You’ll also need to manage stream resumes when a WebSocket client reconnects and think about how to synchronize the connection state across your servers. 

TIP: Some connections will inevitably break at some point. Determine your strategy to ensure that after WebSocket connections are restored, you can resume the stream with guaranteed ordering and (preferably exactly once) delivery.

Copy link to clipboard

Managing heartbeats

The WebSocket protocol natively supports control frames known as Ping and Pong. These control frames are an application-level heartbeat mechanism to detect whether a WebSocket connection is alive. At scale, you should closely monitor heartbeats’ effect on your system. Thousands or millions of concurrent connections with a high heartbeat rate will add a significant load to your WebSocket servers. If you examine the ratio of Ping/Pong frames to actual messages sent over WebSockets, you might send more heartbeats than messages. If your use case allows, reduce the frequency of heartbeats to make it easier to scale. 

TIP: Keep track of idle connections and close them. Even if no messages (text or binary frames) are being sent, you are still sending ping/pong frames periodically, so even idle connections consume resources.  

Copy link to clipboard

Handling backpressure

Backpressure is one of the critical issues you will have to deal with when streaming data to client devices at scale over the internet. For example, let’s assume you are streaming 20 messages per second, but a client can only handle 15 messages per second. What do you do with the remaining five messages per second that the client cannot consume? 

You need a way to monitor the buffers building up on the sockets used to stream data to clients and ensure a buffer never grows beyond what the downstream connection can sustain. Beyond client-side issues, if you don’t actively manage buffers, you’re risking exhausting the resources of your server layer — this can happen very fast when you have thousands of concurrent WebSocket connections. 

A typical backpressure corrective action is to drop packets indiscriminately. To reduce bandwidth and latency, in addition to dropping packets, you should consider something like message delta compression, which generally uses a diff algorithm to send only the changes from the previous message to the consumer rather than the entire message.      

Copy link to clipboard

Best practices for using WebSockets at scale

Here are best practices for scaling your WebSocket infrastructure:

✅ Use horizontal scaling rather than vertical scaling. It’s more reliable, especially for use cases where you can’t afford your system to be unavailable under any circumstances.

✅ If possible, use smaller machines (servers) rather than large ones. They are easier and faster to spin up, and costs are more granular.

✅ Aim to have a homogeneous server farm. It’s much more complicated to balance load efficiently and evenly across machines with different configurations.

✅ Have a good understanding of your use case and relevant parameters (such as usage patterns and bandwidth) before choosing a load balancing algorithm. 

✅ Ensure your server layer is dynamically elastic, so you can quickly scale out when you have traffic spikes. You should also operate with some capacity margin, and have backups for various system components, to ensure redundancy and remove single points of failure.

✅ There is rarely a one-size-fits-all protocol in large-scale systems; different protocols serve different purposes better than others. You need to think about what other options your system needs to support in addition to WebSockets, and consider ways to ensure protocol interoperability.  

✅ You most likely need to support fallback transports, such as Comet long polling, because WebSockets, although widely supported, are blocked by certain enterprise firewalls and networks. Note that falling back to another protocol changes your scaling parameters; after all, stateful WebSockets are fundamentally different from stateless HTTP, so you need a strategy to scale both.

✅ Run load and stress testing to understand how your system behaves under peak load, and enforce hard limits (for example, maximum number of concurrent WebSocket connections) to have some predictability.

✅ WebSocket connections and traffic over the public internet are unpredictable and rapidly shifting. You need a robust realtime monitoring and alerting stack, to enable you to detect and quickly implement remedial measures when issues occur. 

✅ You need to have a load shedding strategy; failing gracefully is always better than a total collapse of your system.  

✅ Use a tiered infrastructure to enable you to recover from faults and coordinate between servers.

✅ Some WebSocket connections will inevitably break at some point. You need a strategy for ensuring that after the WebSocket connections are restored, you can resume the stream with ordering and delivery (preferably exactly-once) guaranteed.

✅ Use a random exponential backoff mechanism when handling reconnections. This allows you to protect your server layer from being overwhelmed, prevents cascading failures, and gives you time to add more capacity to your system.

✅ Keep track of idle connections and close them. Even if no messages (text or binary frames) are being sent, you are still sending ping/pong frames periodically, so even idle connections consume resources.

Copy link to clipboard

Avoiding the challenge of scaling WebSockets 

We have discussed the challenge of scaling WebSockets for production-quality apps and online services, which can involve significant engineering complexity and draw heavily upon resources and time. You could spend several months and a heap of money. So how do you avoid this complexity?

Before anything else, you should consider if WebSocket is the best choice for your use case, or if a WebSocket alternative is better suited. For example, if you only need to push text (string) data to browser clients, and you never expect to require bidirectional communication, then you could use something like Server-Sent Events (SSE). Compared to WebSockets, SSE is less complex and demanding, and easier to scale.  

Read about the differences between SSE and WebSockets

On the other hand, if you’re building a web app where bidirectional communication is needed (e.g, a chat app), then WebSockets is indeed the ideal choice. But this doesn’t mean you have to deal with the challenges and costs of scaling and maintaining WebSocket infrastructure in-house; you can offload this complexity to a managed third-party WebSocket service such as Ably and reduce your cost

Copy link to clipboard

Ably, the WebSocket platform that works reliably at any scale

Ably is a realtime experience infrastructure provider. Our APIs and SDKs help developers build and deliver realtime experiences without having to worry about maintaining and scaling messy WebSocket infrastructure. 

We offer:

  • Pub/sub messaging over serverless WebSockets, with rich features such as message delta compression, automatic reconnections with continuity, user presence, message history, and message interactions.

  • A globally-distributed network of datacenters and edge acceleration points-of-presence. 

  • Guaranteed message ordering and delivery. 

  • Global fault tolerance and a 99.999% uptime SLA.

  • < 50ms round-trip latency (P99).

  • Dynamic elasticity, so we can quickly scale to handle any demand (billions of WebSocket messages sent to millions of pub/sub channels and WebSocket connections). 

Copy link to clipboard

FAQs on scaling WebSockets

Copy link to clipboard

How many WebSocket connections can a server handle?

In theory, a server can handle up to 65,536 sockets per single IP address. In practice, though, there are many factors that influence how many WebSocket connections can be handled by a server: hardware resources, network bandwidth, size and frequency of messages that are transmitted over the WebSocket connections, etc. 

For example, if there are  limited hardware resources or if the WebSocket application requires a lot of processing power, the number of connections that the server can handle may be lower. 

Ultimately, the best way to determine the maximum number of WebSocket connections that a server can handle is to test it under realistic conditions and monitor its performance. This will help identify any bottlenecks or performance issues and allow you to optimize the server and application to handle a larger number of connections. You should also consider scaling horizontally rather than relying on a single server. 

Copy link to clipboard

How many WebSocket connections per client?

WebSocket libraries and web browsers enforce different limits on the number of WebSocket connections. For example, at the time of writing (14th of April 2023), Chrome allows up to 256 concurrent connections per client by default, while Firefox can handle 200 connections per client by default. 

Join the Ably newsletter today

1000s of industry pioneers trust Ably for monthly insights on the realtime data economy.
Enter your email