Elasticity, as the term implies, is the ability for software infrastructure to stretch and shrink in line with fluctuating usage. Elasticity is important in many contexts – because usage can always fluctuate – but it’s especially important for sports, media, and entertainment apps, which frequently serve user bases that grow and shrink rapidly depending on events, trends, and breaking news.
Elasticity is not a new problem but for realtime update systems, the challenge of building adaptable, elastic infrastructure is different in two major ways:
- Elasticity is harder to deliver because realtime updates tend to be resource-intensive and especially in sports, media, and entertainment contexts, the amount of users can fluctuate dramatically if an important event or game is happening.
- Elasticity is also more important to maintain in realtime contexts because the “live” experience that realtime features typically promise is effectively broken if users can’t access updates.
In a previous article, we walked through the fundamental parts of an effective realtime updates system. Here, we’ll examine elasticity in detail.
Why elasticity is important for sports, media, and entertainment apps
The consequences of elasticity and reliability issues are significantly higher for companies providing realtime updates than for companies in different contexts.
In 2021, for example, Facebook had a widespread network outage that affected more than 5 billion users and the company lost about $60 million in advertising revenue.
The graph below shows that despite the length of the outage, most users logged back in after it was over and traffic continued to grow.
For most companies, this is what downtime is like – outages are bad but rarely deadly.
But it’s different for realtime updates. You can understand this better by returning to user experience and expectations. If a sports app purports to provide live updates and attracts the kinds of sports fans that value live updates, then even small interruptions can compromise the experience.
This expectation persists even as the circumstances around the updates change. Even if the realtime updates are steady and functional across months of steady user activity, users will be upset if the amount of users fluctuates for a big event, such as the finale of a competition reality show, and realtime updates struggle due to the increased load.
The final song in the latest Eurovision, for example, broke records with 11 million viewers and users of an app providing realtime updates on the event would have been rightfully upset if it couldn’t handle the increased load at the biggest moment.
Strategies for effective realtime update scalability
If you know your industry and users well, then you know the general rate of user engagement and you know about the events that are likely to cause a surge in users. But you also know – even if you can’t predict them – that there will likely be unforeseen moments with big surges.
There are two approaches to handling influxes of users:
- With a manual approach, engineers can add extra servers and perform load testing to ensure that the system will be able to handle the extra load. This can work well for predictable events, such as a scheduled draft or weekly reality show episode.
- With an automatic approach, the system can scale up or down based on user demand and without engineer intervention. The system can handle expected events, but it can also withstand a surprising user surge if a new, exciting player makes a surprise debut in an otherwise normal game.
The more likely it is that there will be large spikes in usage, the better an automatic approach is going to be. But in either approach, companies building or buying realtime updates infrastructure need to consider:
- Scalability: Global levels of reach need careful planning and design. At this scale, simply adding resources won’t be enough.
- Reliability: Any downtime – even for a global scale app – will compromise the experience. Realtime updates systems need redundancy and backups.
- Security: A DDoS or botnet attack can use the scale of a global app against it, so security efforts need to be effective and resilient.
- Cost: Global scale apps can be expensive to build and can remain expensive with ongoing maintenance costs. Costs can grow as user bases expand across regions and maintenance gets even more costly.
The challenges involved in scaling WebSockets, a connection protocol, provide a good example. This protocol makes bidirectional communication between clients and servers possible – as well as the ability to simultaneously send and receive data.
If you have to scale WebSockets, you need to consider whether you’re scaling horizontally or vertically. From there, you need to build load balancing, fallback, and connection plans.
And that’s just to start – the complexity multiplies the more unpredictable your user base is. WebSockets, even after all the work only touched on here, is still only one part of a larger scalability strategy.
Scalable realtime updates with Ably
Building realtime update infrastructure is daunting because the work is difficult and the quality of the work is high-stakes. There are a multitude of edge cases and problems tend to compound, making scalability perpetually difficult.
Elasticity also tends to resist simple solutions. Adding new servers, for example, can improve scalability but new servers can make data integrity harder to maintain.
At Ably, we’ve built a data broadcast solution that is both elastic and highly available. Our solution is informed by years of work across multiple industries with a variety of needs.
We’ve built a mathematically grounded design that ensures capacity and availability – allowing developers to maintain levels of scalability and elasticity that meet all of their users’ realtime expectations.
Learn more about our data broadcast solution to see how companies like Metra and BlueJeans support dynamic, data-driven experiences with Ably.
This is the last in a series of four blog posts about the three fundamentals for building realtime updates in sports, media, and entertainment apps. In other posts, we look at why low latency and data integrity are so important when you're trying to deliver realtime updates to end users at scale.