In realtime applications it goes without saying that we need information from our servers as soon as it’s available – and, fundamentally, the classic HTTP request/response paradigm isn’t up to the job. That’s because the server will be silent, new data or not, unless or until a consumer requests an update.
That limitation saw the emergence of all manner of hacks and workarounds as devs sought to adapt that request/response model to the demands of a more dynamic, realtime web – some of which became formalized and pretty widely adopted.
All these technologies and approaches – from Comet to HTTP long polling – have one thing in common: Essentially, they set out to create the illusion of truly realtime (event-driven) data exchange/communication, so when the server has some new data, it sends a response.
Even though HTTP is not an event-driven protocol, so is not truly realtime, these approaches actually work quite well in specific use cases, Gmail chat for instance. Problems emerge, however, in low-latency applications or at scale, mainly because of the processing demands associated with HTTP.
That is, with HTTP you have to continuously request updates (and get a response back), which is very resource-intensive: The client establishes a connection, requests an update, gets a response from the server, then closes the connection. Imagine this process being repeated endlessly, by thousands of concurrent users – it’s incredibly taxing on the server at scale.
It is these issues that ultimately drove developers Michael Carter and Ian Hickson to develop WebSockets – essentially a thin transport layer built on top of a device’s TCP/IP stack. The intent was to provide what is essentially a TCP communication layer to web applications that's as close to raw as possible, bar a few abstractions to eliminate certain security-based complications and other concerns.
Instead, this post will look at some techniques used to bypass the limitations of the HTTP request/response paradigm in realtime applications, some associated issues with each, and how WebSockets can help to overcome them.
HTTP is essentially a request/response protocol in the client-server computing model, and the primary communication mode of the World Wide Web. The original version, proposed as an application protocol in 1989 by Tim Berners-Lee, was very limited, and quickly modified to support wider browser and server functionality.
Those modifications were eventually documented by the HTTP Working Group in 1996 as HTTP/1.0 (RFC 1945) – though HTTP/1.0 is not considered a formal specification or an Internet standard.
HTTP/1.1 is the most widely supported version in web browsers and servers and its arrival was a big step forward, because it enabled some pretty important optimizations and enhancements, from persistent and pipelined connections, to new request/response header fields. Chief among them are two headers that are the basis for many of the improvements that have helped to enable a more dynamic, realtime web:
The Keep-Alive header: Used to set up persistent communications between hosts. That means the connection can be reused for more than one request, which reduces request latency perceptibly because the client does not need to re-negotiate the TCP 3-Way-Handshake connection after the first request has been sent. Another positive side effect is that, in general, the connection becomes faster with time due to TCP's slow-start-mechanism. Prior to HTTP/1.1, you had to open a new connection for every single request/response pair
The Upgrade header: Used to upgrade the connection to an enhanced protocol mode (such as WebSockets).
HTTP Polling represented a step up from the classic request/response mechanism – though, while there are various versions of polling, only long polling is in any way applicable to realtime applications.
For instance, HTTP short polling uses an AJAX-based timer to ensure that client devices send server requests at fixed intervals. However, the server will still respond immediately to each request, either providing new data or sending an ‘empty’ response if there is no new data, before closing the connection. So, it’s really not much use at all in realtime applications, when the client needs to know about new data as soon as it’s available.
It was that limitation which led to the development of HTTP long polling, which is essentially a technique designed to emulate a server push feature.
We’ve covered HTTP long polling in detail here, but in essence long polling is a technique where the server elects to hold a client’s connection open for as long as possible (usually up to 20 seconds), delivering a response only after either the data becomes available or a timeout threshold is reached.
The main advantage of long polling is that new information is, in theory, sent to the client as soon as it’s available. The downside, however, is the overhead that comes with processing HTTP requests, which can create a host of issues at scale.
HTTP streaming is a push-style data transfer technique that allows a web server to continuously send data to a client over a single HTTP connection that remains open indefinitely. Essentially, the client makes an HTTP request, and the server pushes out a response of indefinite length.
However, while HTTP streaming is performant, easy to consume and can be an alternative to WebSockets, it does have limitations. The main issue from a realtime perspective is that an intermediary can interrupt the connection – whether through timeout or simply because it is serving multiple requests ‘round-robin style’ – so it is not always possible to guarantee realtime-ness.
HTTP/2.0 evolved from an experimental protocol – SPDY – which was originally announced by Google in 2009. By 2015, the HTTP Working Group had published HTTP/2.0 as a Proposed Standard, having taken the SPDY specification as its starting point.
We’ve covered HTTP/2.0 in detail before, but it was essentially a performance update designed to improve the speed of web communications. The main developments in the context of realtime communication were:.
Multiplexing: Rather than transporting data in plaintext format, data is encoded as binary and encapsulated inside frames which can be multiplexed along bidirectional channels known as streams – all over a single TCP connection. This allows for many parallel requests/responses to take place concurrently
Server push: Server push is a performance feature that allows a server to send responses to an HTTP/2-compliant client before the client requests them. This feature is useful when the server knows that the client needs the ‘pushed’ responses to process the original request fully.
Despite those, and other, steps forward, the explosion of internet traffic driven by the massive uptake of mobile devices has seen HTTP/2.0 struggle to provide a smooth, transparent web browsing experience – especially under the ever-increasing demands of realtime applications and their users.
All browsers support HTTP/2 protocol over HTTPS with the installation of an SSL certificate.
HTTP/2 allows the client to send all requests concurrently over a single TCP connection. Theoretically, the client should receive the resources faster.
TCP is a reliable, stable connection protocol.
Concurrent requests can increase the load on the servers. HTTP/2 servers can receive requests in large batches, which can lead to requests timing out. The issue of server load spiking can be solved by inserting a load balancer or a proxy server, which can throttle the requests.
Server support for HTTP/2 prioritization is not yet mature. Software support is still evolving. Some CDNs or load balancers may not support prioritization properly.
The HTTP/2 push feature can be tricky to implement correctly.
HTTP/2 addressed HTTP head-of-line blocking, but TCP-level blocking can still cause problems.
HTTP/3.0 is a new iteration of HTTP, which has been in development since 2018 and, even though it is still a draft standard at the time of writing (as of October 2020), some browsers are already making use of it.
The aim of HTTP/3 is to provide fast, reliable, and secure web connections across all forms of devices by straightening out the transport-related issues of HTTP/2. To do this, it uses a different transport layer network protocol called QUIC, which runs over the User Datagram Protocol (UDP) instead of TCP, which is used by all previous versions of HTTP.
There are already some potential issues with HTTP/3 starting to emerge. For instance:
Transport layer ramifications. Transitioning to HTTP/3 involves not only a change in the application layer, but also a change in the underlying transport layer. Hence, adoption of HTTP/3 is a bit more challenging compared to its predecessor.
Reliability and data integrity issues. UDP is generally suitable for applications where packet loss is acceptable. That's because UDP does not guarantee that your packets will arrive in order. In fact, it does not guarantee that your packets will arrive at all. So if data integrity is important to your use case and you're using HTTP/3, you will have to build mechanisms to ensure message ordering and guaranteed delivery.
Introduction of new (different) transport protocol QUIC running over UDP means a decrease in latency both theoretically, and, for now, experimentally.
Because UDP does not perform error checking and correction in the protocol stack, it is suitable for use cases where these are either not required or are performed in the application. This means UDP avoids any associated overhead. UDP is often used in time-sensitive applications, such as realtime systems, which cannot afford to wait for packet retransmission and therefore tolerate some dropped packets.
Transport layer ramifications. Transitioning to HTTP/3 involves not only a change in the application layer but also a change in the underlying transport layer. Hence, adoption of HTTP/3 is a bit more challenging compared to its predecessor.
Reliability issues. UDP applications tend to lack reliability, it must be accepted there will be a degree of packet loss, re-ordering, errors, or duplication. It is up to the end-user applications to provide any necessary handshaking, such as real time confirmation that the message has been received.
HTTP/3 is not yet fully standardized.
WebSockets allow both the server and the client to push messages at any time without any relation to a previous request. One notable advantage of using WebSockets is, almost every browser supports WebSockets.
WebSocket solves a few issues with HTTP:
Bi-directional protocol – either client/server can send a message to the other party (In HTTP, the request is always initiated by the client and the response is processed by the server – making HTTP a uni-directional protocol)
Full-duplex communication – client and server can talk to each other independently at the same time.
Single TCP connection – After upgrading the HTTP connection in the beginning, client and server communicate over that same TCP connection (persistent connection) throughout the lifecycle of WebSocket connection.
WebSocket is an event-driven protocol, which means you can actually use it for truly realtime communication. Unlike HTTP, where you have to constantly request updates, with websockets, updates are sent immediately when they are available.
WebSockets keeps a single, persistent connection open while eliminating latency problems that arise with HTTP request/response-based methods.
WebSockets generally do not use XMLHttpRequest, and as such, headers are not sent every-time we need to get more information from the server. This, in turn, reduces the expensive data loads being sent to the server.
WebSockets don’t automatically recover when connections are terminated – this is something you need to implement yourself, and is part of the reason why there are many client-side libraries in existence.
Browsers older than 2011 aren’t able to support WebSocket connections – but this is increasingly less relevant.
Generally, WebSockets will be the better choice in the context of realtime, ongoing communication.
HTTP-based techniques tend to be much more resource intensive on servers whereas WebSockets have an extremely lightweight footprint on servers. Meanwhile, approaches like long polling also require many hops between servers and devices, and these gateways often have different ideas of how long a typical connection is allowed to stay open. If it stays open too long something may kill it, maybe even when it was doing something important.
Why you should build with WebSockets:
Websockets are event-driven (unlike HTTP). Arguably, event-driven is a prerequisite for true realtime.
Full-duplex asynchronous messaging. In other words, both the client and the server can stream messages to each other independently.
Good security model (origin-based security model)
WebSockets (supported by 98% of browsers globally as of Oct 2020)
There is, however, a lot involved when implementing support for WebSockets with HTTP-based techniques like long polling as a fallback. For instance, beyond the client and server implementation details, you also have to build in support for other transports to ensure robust support for different client environments, as well as broader concerns, such as authentication and authorization, guaranteed message delivery, reliable message ordering, historical message retention, and more.
NEW: WebRTC vs WebSocket