At Ably we provide a service that handles high volumes of client connections – multiple millions of concurrent WebSocket and HTTP streaming connections. Said connections are terminated by a set of frontend compute instances; distributing the connections among available instances is the role of one or more load balancers. Each load balancer exposes a single network endpoint externally, and forwards individual requests to frontend server instances managed in a pool that can scale to match the request load. By providing this multiplexing capability, a load balancer becomes an essential building block of a scalable service.
Load balancers can provide additional functionality. For example, they can perform health checks on the upstream server instances so that requests are forwarded only to healthy instances; they can terminate TLS, or provide application-level proxying, and in turn that can be the basis of other functionality (such as sticky routing based on HTTP cookies).
The ask: practically infinite scalability
AWS has provided an Elastic Load Balancer (ELB) component from the earliest days of the EC2 service. It is the classic offering since 2009.
The attractiveness of having this functionality available as a service is that it takes care of one of the difficult scaling points of service architecture; the instances upstream of the load balancer can be stateless and only need to be concerned with handling individual requests without the need for coordination or contention with other instances. A potentially infinitely scalable service can be constructed with a very small set of primitives: an elastic load balancer and a scaling group of server instances.
At Ably we used classic ELBs as a core element of the architecture of our AWS-hosted service after a study of AWS's performance and scalability characteristics.
AWS subsequently updated its load balancer offerings, and introduced Network Load balancers (NLBs). These are Layer 4 devices: an NLB, if required, can terminate incoming TCP/TLS connections; it determines which of the upstream targets to forward it to, and ensures there is an onward TCP connection.
The available targets are managed by the NLB as a target group. Elasticity of the upstream target pool is possible by adding and removing targets from the target group. Elasticity of the NLB itself is intrinsic to the NLB service; these are advertised by AWS as scaling effortlessly to millions of requests per second, and being able to automatically scale to meet the vast majority of workloads. Keen to improve the elasticity offered by ELBs, we migrated the Ably infrastructure to use NLBs.
In many respects NLBs worked very well initially, and certainly eliminated the “warming” issues we had always experienced with classic ELBs. However, in testing the scalability of NLBs, we found a number of substantial challenges when trying to take advantage of the suggested effortless and practically infinite scalability.
The application: millions of realtime subscriptions
Ably provides a pub/sub messaging service that supports applications at extraordinary scale: we often run tens of millions of concurrent, long-running client connections (instead of connect/reconnect polling).
Workloads are characterized by high connection counts, and high rates of new connection establishment. Traffic on individual connections ranges from low (single digit messages per minute) up to hundreds of messages per second.
Elasticity is a key characteristic of the load: a wide range of applications from live events, sports, audience participation, and social networking will experience load fluctuations of multiple orders of magnitude. This is a key reason why application developers look to outsource the connection management problem to service providers, and why service providers look to cloud solutions.
We routinely perform load testing of the service. This is to prove stability of the service, explore scalability limits, and we also frequently do this when being competitively evaluated by potential customers. After an experimental migration to NLBs, we were keen to understand how far this new primitive could go in supporting elasticity and scale.
We use a derivative of the Boomer load generator for Locust for much of our load testing. This enables us to run multiple thousands of load generator processes and coordinate the load projected towards a test service endpoint for a range of use-cases. A very basic routine test involves steadily ramping up the number of connections to verify a given client population can be supported.
Limit 1: maximum target group size
The first NLB limit we hit was a limit on the number of targets (i.e. frontend server instances) that can belong to a target group. The AWS NLB documentation explains the existence of a number of quotas. As with other quotas, you might expect these could be increased as an account management operation.
However, in direct discussion with AWS, it became clear that, due to architectural constraints, some of those quotas are simply hard limits to be exceeded at one’s own risk. One of them is the maximum size of a target group, which is 500. This places a strict and immediate cap on the size of a system targeted by a single load balancer.
There are various strategies to deal with this limit.
Vertical EC2 scaling
If the server software stack can make use of larger server instances, then it is clearly possible to increase the capacity of a system, with a constraint on the total target group size, by increasing the capacity of each target.
The limitations of this approach are the following:
- You can only scale so far with vertical scaling: once you have reached the largest instance type, you cannot scale any further.
- The server software stack might be unable to take advantage of larger instance types transparently. In particular, server technologies such as NodeJS are unable to take advantage, transparently, of higher numbers of CPU cores. Running multiple server processes on a single server instance can help, but then there needs to be a further mechanism such as another intervening reverse proxy to load balance traffic among the server processes on the instance.
Multiple, sharded NLBs
AWS advised us that it would be necessary to have multiple NLBs beyond the 500 target group limit, and to shard traffic between them. Each NLB then targets an independent scaling group of server instances. We can thus route traffic to multiple NLBs by creating a DNS-weighted set; the policy can route requests to all the NLBs, for example, by distributing them among the available resources in a round-robin manner.
The limitations of this approach are:
- Increased cost – having multiple NLBs obviously means paying for multiple NLBs.
- No elasticity – the individual target groups can be elastic, but the number of NLBs itself cannot be elastic: the set of NLBs needs to be sized statically for the maximum anticipated load, and then remains fixed, even when the load would not have required additional NLBs. This exacerbates the cost issue above.
- The number of NLBs needs to be a multiple of the number of Availability Zones (AZs) being targeted. As such, if you want to have instances spread across 3 AZs, you need a multiple of 3 NLBs.
Having multiple NLBs and sharding the traffic between them is the only way to scale beyond the fixed target group size limit, so we adopted this approach in the Ably NLB-based networking infrastructure despite the aforementioned challenges.
Limit 2: Connection stability
A typical load test has a continuously growing set of subscribers targeting a single endpoint, while monitoring various performance parameters such as connection time and message transit latency. During our tests we occasionally found that most or all of the currently established connections would suddenly drop for no apparent reason.
This specific problem occurred consistently when the number of established connections through a single NLB reached 2 million: connections dropped, and reconnection attempts failed for a period of several minutes before the NLB re-admitted traffic.
Increasing other parameters of the configuration had no impact, whether it was the number of server instances, or the number of client instances (to have fewer client connections per instance).
It just seemed impossible to maintain that many connections stably through a single NLB.
We discussed the issue with AWS and it was confirmed that this is also due to architectural limits in the NLB design, in this case placing a limit on the number of connections that can be maintained.
However, unlike the target group limit, this constraint is not stated explicitly. There is no warning when this limit is being reached or has been passed beyond the stable capacity of the NLB, resulting in tens of thousands of existing connections being dropped as the NLB presumably crumples under the levels of load it is not designed to handle.
Unfortunately, there didn’t appear to be a clear point at which the connections would, in fact, be stable. Under 1 million connections, things were generally more stable, but mass connection drops would still occasionally happen – except in these cases up to a third of all connections dropped. It was unclear when to deem the NLB reliable.
AWS’s advice was to attempt to support no more than 400K simultaneous connections on a single NLB. This is a dramatically more significant constraint than the target group limit; if each server instance is handling 10K connections, that effectively limits each NLB to supporting 40 fully-loaded server instances.
This has impact on cost, certainly, but also imposes a significant elasticity constraint. Absorbing 1 million unexpected connections on demand is no longer possible if sufficient NLB provisions aren’t made ahead of time.
Even below the 400K connection threshold we would still occasionally see situations where some 20% of the connections spontaneously dropped. Those connections can successfully reconnect, but nonetheless this causes unwanted latency for the client, and more work for the backend system.
NLBs are advertised by AWS to distribute incoming traffic across multiple targets, automatically scale to the vast majority of workloads, and handle millions of requests per second.
From our experience, there appears to be no issue in the distribution of data to multiple targets. However, NLBs do not yet fully handle the scale we need in terms of requests per second and automatic scaling. We can use NLBs to achieve the workloads Ably produces by adding complexity to our architecture. However, the advertised promise of NLBs to automatically scale and handle the workload that Ably requires day in day out is not yet possible.
The limit in the upstream targets that a single NLB can address means you have to use multiple NLBs at higher scale, pushing horizontal scaling further out toward the client, with the client’s address resolution playing a part in the load balancing process.
However, achieving effective elasticity of higher scale systems requires not only the ability to handle a large number of connections, but also connection stability at those higher counts.
It is possible to handle single-digit millions of connections with a small fixed number of NLBs, and these configurations can be effectively elastic since the principal resource pools – the frontend server scaling groups – are fully elastic. But to build systems that scale up by a further order of magnitude, up to 100 million connections, the NLB proposition strains under serious testing.
Most of the time and for most customers, when AWS does things at scale, it is stellar. They live up to what’s advertised and that’s why we trust the dependability of the offering. When an offering isn’t quite up to AWS’s very high standards, it is difficult for us to deal with.
In our experience anything over 200,000 connections per NLB begins to be a challenge. This is unfortunately substantially below the advertised understanding of what the load balancers can currently do. Many of our individual customers require much, much greater scale, and certainly on aggregate Ably needs massively greater scale. This is in sharp contrast to AWS offerings we’ve gotten used to and depend on.
At Ably our goal is to provide truly scalable connectivity for our customers’ applications. We want the users of the Ably service/platform to focus on delivering value with other layers of the application stack.
Achieving that scale, with dependability, is an ongoing and significant engineering challenge for ourselves and AWS, with whom we are a Technology Partner. We make it our job to take care of such challenges so our customers don’t have to.
This includes reaching out to our cloud provider and working with them to see what we can do to handle the expected future loads as we grow our capacity.
AWS NLBs are one of several tools we continually evaluate to support our growing technical demands in order to provide a bulletproof enterprise-grade realtime platform/service to our customers and guarantee critical functionality at scale.
To learn more about how we can help you simplify engineering, reduce DevOps overhead, and increase development velocity, talk to our Technical Experts.
More from Ably Engineering
- Migrating from Node Redis to Ioredis: a slightly bumpy but faster road 🆕
- The Mysterious Gotcha of gRPC Stream Performance
- Engineering dependability and fault tolerance in a distributed system
- Achieving exactly-once delivery with Ably
- Adventures in BEAM optimization with our MQTT adapter
Cassandra counter columns: nice in theory, hazardous in practice