Nowadays, WebSockets are a preferred choice for many organizations and developers seeking to build interactive, realtime digital features that provide delightful user experiences. But how did WebSockets come about? In this article, we will discuss what led to the emergence of the WebSocket technology, and how it’s an improvement on HTTP-based realtime techniques.
What is WebSocket?
In a nutshell, WebSocket is a realtime web technology that enables bidirectional, full-duplex communication between client and server over a persistent connection. The WebSocket connection is kept alive for as long as needed (in theory, it can last forever), allowing the server and the client to send data at will, with minimal overhead.
Learn more about:
The road to WebSocket: AJAX and Comet
The first realtime web apps started to appear in the 2000s, attempting to deliver responsive, dynamic, and interactive end-user experiences. However, at that time, the realtime web was difficult to achieve and slower than we’re used to nowadays; it was delivered by hacking existing HTTP-based technologies like AJAX and Comet that were not designed and optimized for realtime applications.
AJAX (short for Asynchronous JavaScript and XML) is a method of asynchronously exchanging data with a server in the background and updating parts of a web page — without the need for an entire page refresh (postback). Publicly used as a term for the first time in 2005, AJAX encompasses several technologies:
HTML (or XHTML) and CSS for presentation.
Document Object Model (DOM) for dynamic display and interaction.
XML or JSON for data interchange, and XSLT for XML manipulation.
XMLHttpRequest
(XHR) object for asynchronous communication.JavaScript to bind everything together.
The diagram below shows how AJAX works compared to the classic model of building a web app with HTTP.
In a classic model, most user actions in the UI trigger an HTTP request sent to the server. The server processes the request and returns the entire HTML page to the client. In comparison, AJAX introduces an intermediary (an AJAX engine) between the user and the server. Although it might seem counterintuitive, the intermediary improves responsiveness. Instead of loading the web page, at the start of the session, the client loads the AJAX engine, which is responsible for:
Regularly polling the server on the client’s behalf.
Rendering the interface the user sees, and updating it with data retrieved from the server.
AJAX (and XMLHttpRequest
request in particular) can be considered a black swan event for the web. It opened up the potential for web developers to start building truly dynamic, asynchronous, realtime-like web applications that could communicate with the server silently in the background, without interrupting the user’s browsing experience. Google was among the first to adopt the AJAX model in the mid-2000s, initially using it for Google Suggest, and its Gmail and Google Maps products. This sparked widespread interest in AJAX, which quickly became popular and heavily used.
Comet
Coined in 2006, Comet is a web application design model that allows a web server to push data to the browser. Similar to AJAX, Comet enables asynchronous communication. Unlike classic AJAX (where the client periodically polls the server for updates), Comet uses long-lived HTTP connections to allow the server to push updates whenever they’re available.
The Comet model was made famous by organizations such as Google and Meebo. The former initially used Comet to add web-based chat to Gmail, while Meebo used it for their web-based chat app that enabled users to connect to AOL, Yahoo, and Microsoft chat platforms through the browser. In a short time, Comet became a default standard for building responsive, interactive web apps.
Several different techniques can be used to deliver the Comet model, the most well-known being long polling and HTTP streaming.
Essentially a more efficient form of polling, long polling is a technique where the server elects to hold a client’s connection open for as long as possible, delivering a response only after data becomes available or a timeout threshold is reached.
Learn how long polling compares to WebSocket
Also known as HTTP server push, HTTP streaming is a data transfer technique that allows a web server to continuously send data to a client over a single HTTP connection that remains open indefinitely. Whenever there’s an update available, the server sends a response, and only closes the connection when explicitly told to do so. HTTP streaming is commonly implemented using Server-Sent Events (SSE).
AJAX and Comet limitations
AJAX and Comet paved the way for creating dynamic, realtime web apps. However — even though they continue to be used nowadays, to a lesser extent — both AJAX and Comet have their shortcomings.
Most of their limitations stem from using HTTP as the underlying transport protocol. The problem is that HTTP was initially designed to serve hypermedia resources in a request/response fashion. It hadn’t been optimized to power realtime apps that usually involve high-frequency or ongoing client-server communication, and the ability to react instantly to changes.
Hacking HTTP-based technologies to emulate the realtime web was bound to lead to all sorts of drawbacks, such as:
Limited scalability
HTTP polling, for example, involves sending requests to the server at fixed intervals to see if there’s any new update to retrieve. High polling frequencies result in increased network traffic and server demands. This doesn’t scale well, especially as the number of concurrent users rises.
Unreliable message ordering
Reliable message ordering can be an issue, since it’s possible for multiple HTTP requests from the same client to be in flight simultaneously. Due to various factors, such as unreliable network conditions, there’s no guarantee that the requests issued by the client and the responses returned by the server will reach their destination in the right order.
Increased latency
The time required to establish a new HTTP connection is significant since it involves a handshake with a few back-and-forth exchanges between the client and the server. In addition to the slow start, we must also consider the impact of HTTP headers, which often outweigh the core data being delivered, increasing message size and causing delays.
No bidirectional streaming
A request/response protocol by design, HTTP doesn’t support bidirectional, always-on, realtime communication between client and server over the same connection.
Enter WebSocket
With the web continuously evolving, and user expectations of rich, realtime web-based experiences growing, it was becoming increasingly obvious that an alternative to HTTP was needed.
In 2008, the pain and limitations of using Comet when implementing anything resembling realtime were being felt particularly keenly by developers Michael Carter and Ian Hickson. Through collaboration on IRC and W3C mailing lists, they came up with a plan to introduce a new standard for modern, truly realtime communication on the web. Thus, the name “WebSocket’’ was coined.
Comparing WebSocket and HTTP
A WebSocket connection starts as an HTTP request/response handshake. After a successful WebSocket handshake, there are no more requests and responses. Instead, the connection is persistent, allowing both the WebSocket server and the WebSocket client to send low-latency messages at will, with minimal overhead. This gives WebSockets a significant performance boost and makes them a much better choice than HTTP for building realtime apps.
The table below highlights the key conceptual differences between WebSocket and HTTP.
Criteria | WebSocket | HTTP/1.1 |
---|---|---|
Architecture | Event-driven | Request-driven |
Data transmission | Full-duplex | Half-duplex |
Messaging pattern | Bidirectional | Request-response |
Server push | Core feature | Not natively supported; you have to use polling or streaming techniques to emulate this capability. |
Overhead | Moderate overhead to establish the connection, and minimal overhead per message. | Moderate overhead per request/connection. |
State | Stateful | Stateless |
WebSocket standardization and early adoption
The WebSocket interface made its way into the HTML5 specification, which was first released as a draft in January 2008. The WebSocket protocol was standardized in 2011 via RFC 6455.
In December 2009, Google Chrome 4 was the first browser to ship full support for WebSockets. Other browser vendors started to follow suit over the next few years; today, almost all modern browsers offer WebSocket support. Going beyond browser web pages, WebSockets can be used for realtime communication across various types of user agents — for example, mobile apps.
What are WebSockets used for?
WebSockets offer low-latency communication capabilities which are suitable for various types of realtime use cases. For example, you can use WebSockets to:
Power live chat experiences.
Broadcast realtime event data, such as live scores and traffic updates.
Facilitate multiplayer collaboration on shared projects and whiteboards.
Deliver notifications and alerts.
Keep your backend and frontend in realtime sync.
Add live location tracking to urban mobility and food delivery apps.
The evolution of WebSocket
It’s been more than a decade since WebSocket was conceived. Since then, WebSocket has matured into one of the key technologies powering the realtime web. Nowadays there are numerous WebSocket libraries and frameworks available in every programming language. Most of them are built on top of the raw WebSocket protocol, while providing additional capabilities — thus making it easier and more convenient for developers to implement WebSockets into their apps and build WebSocket-based functionality.
Take, for example, Socket.IO — one of the most popular open-source realtime libraries. It uses raw WebSockets as a foundation, while offering some extra features, such as:
Fallback to HTTP long polling for environments where WebSockets aren’t supported (e.g., corporate networks with proxy servers).
Disconnection detection and automatic reconnections.
Multiplexing (namespaces).
Broadcasting to all clients, or a subset of clients via rooms.
Acknowledgments (via callbacks).
See how WebSocket compares to Socket.IO
While it’s true that open-source libraries simplify your experience working with WebSockets (as opposed to using the raw WebSocket protocol), you still have to manage WebSocket infrastructure yourself. This can be tricky to do at scale; have a read of the challenges of scaling Socket.IO to get an idea of what’s involved.
If you want to avoid the challenges and costs of scaling and maintaining WebSocket infrastructure in-house, you can offload this complexity to a managed third-party PaaS such as Ably.
Ably, the WebSocket platform that works reliably at any scale
Ably is a realtime experience infrastructure provider. Our APIs and SDKs help developers build and deliver realtime experiences without having to worry about maintaining and scaling messy WebSocket infrastructure.
Key Ably features and capabilities:
Pub/sub messaging over serverless WebSockets, with rich features such as message delta compression, automatic reconnections with continuity, user presence, message history, and message interactions.
A globally-distributed network of datacenters and edge acceleration points-of-presence.
Guaranteed message ordering and delivery.
Global fault tolerance and a 99.999% uptime SLA.
< 65ms round-trip latency (P99).
Dynamic elasticity, so we can quickly scale to handle any demand (billions of WebSocket messages sent to millions of pub/sub channels and WebSocket connections).
Explore our documentation to find out more and get started with a free Ably account.
Recommended Articles
What are WebSockets used for?
Learn the answers to questions like: What kind of use cases are WebSockets best suited for? Which companies use WebSockets in production?
The challenge of scaling WebSockets [with video]
Scaling WebSockets for a production system can be challenging in terms of load balancing, fallback strategy, and connection management. Here's how to tackle it.
WebSocket alternatives
Discover the five best alternatives to the WebSocket protocol for building realtime apps such as live chat, multiplayer collaboration, and data broadcast applications.