1. Topics
  2. /
  3. Protocols
  4. /
  5. WebSocket API and protocol explained: How they work, are used and more
19 min readUpdated May 17, 2024

The WebSocket API and protocol explained

Alex Diaconu
Written by:
Alex Diaconu

WebSocket marks a turning point for web development. Designed to be event-driven, and optimized for low latency, the WebSocket technology has become a preferred choice for many organizations and developers seeking to build interactive, realtime digital experiences that provide delightful user experiences. This article explores key WebSocket-related topics:

Copy link to clipboard

WebSocket: The protocol and API explained

WebSocket is a realtime technology that enables bidirectional, full-duplex communication between client and server over a persistent, single-socket connection. The WebSocket connection is kept alive for as long as needed (in theory, it can last forever), allowing the server and the client to send data at will, with minimal overhead.

The WebSocket technology consists of two core building blocks:

  • The WebSocket protocol 

  • The WebSocket API

Copy link to clipboard

What is the WebSocket protocol?

The WebSocket protocol enables ongoing, full-duplex, bidirectional communication between a web client and a web server over an underlying TCP connection. The protocol is designed to allow clients and servers to communicate in realtime, allowing for efficient and responsive data transfer in web applications.

In December 2011, the Internet Engineering Task Force (IETF) standardized the WebSocket protocol through RFC 6455. In coordination with IETF, the Internet Assigned Numbers Authority (IANA) maintains the WebSocket Protocol Registries, which define many of the codes and parameter identifiers used by the protocol.

Copy link to clipboard

What is the WebSocket API?

Included in the HTML Living Standard, the WebSocket API is a programming interface for creating WebSocket connections and managing the data exchange between a client and a server in a web app. It provides a simple and standardized way for developers to use the WebSocket protocol in their applications.

Nowadays, almost all modern browsers support the WebSocket API. Additionally, there are plenty of frameworks and libraries — both open-source and commercial solutions — that implement WebSocket APIs. 

Copy link to clipboard

What are WebSockets used for? 

WebSockets offer low-latency communication capabilities which are suitable for various types of realtime use cases. For example, you can use WebSockets to: 

  • Power live chat experiences. 

  • Broadcast realtime event data, such as live scores and traffic updates.

  • Facilitate multiplayer collaboration on shared projects and whiteboards.

  • Deliver notifications and alerts.

  • Keep your backend and frontend in realtime sync.

  • Add live location tracking capabilities to urban mobility and food delivery apps.

Learn more about WebSocket use cases

Copy link to clipboard

How do WebSockets work?

At a high level, working with WebSockets involves three main steps:

  • Opening a WebSocket connection. The process of establishing a WebSocket connection is known as the opening handshake, and consists of an HTTP request/response exchange between the client and the server. See how to establish a WebSocket connection for more details.  

  • Data transmission over WebSockets. After a successful opening handshake, the client and server can exchange messages (frames) over the persistent WebSocket connection. WebSocket messages may contain string (plain text) or binary data. Learn more about data transmission over WebSockets

  • Closing a WebSocket connection. Once the persistent WebSocket connection has served its purposes, it can be terminated; both the client and the server can initiate the closing handshake by sending a close message. Read more about closing a WebSocket connection.    

Let's explore each of these steps in detail by first looking at things from a protocol perspective (as described in RFC 6455) before seeing how you can open/close and send data using the WebSocket API in browsers. 

Copy link to clipboard

How to establish a WebSocket connection

Copy link to clipboard

Establishing a connection at the WebSocket protocol level

Per the WebSocket protocol specification, the process of establishing a WebSocket connection is known as the opening handshake, and consists of an HTTP/1.1 request/response exchange between the client and the server. The client always initiates the handshake; it sends a GET request to the server, indicating that it wants to upgrade the connection from the HTTP protocol to WebSocket. 

Here’s a basic example of a GET request made by the client to initiate the opening handshake:

GET wss://example.com:8181/ HTTP/1.1
Host: localhost: 8181
Connection: Upgrade
Upgrade: websocket
Sec-WebSocket-Version: 13
Sec-WebSocket-Key: zy6Dy9mSAIM7GJZNf9rI1A==

The server must return an HTTP 101 Switching Protocols response code for the WebSocket connection to be successfully established:

HTTP/1.1 101 Switching Protocols
Connection: Upgrade
Sec-WebSocket-Accept: EDJa7WCAQQzMCYNJM42Syuo9SqQ=
Upgrade: websocket

Opening handshake headers

The table below describes the headers used by the client and the server during the opening handshake - both the required ones (illustrated in the code snippets above) and the optional ones.

Header

Required

Description

Host

Yes

The host name and optionally the port number of the server to which the request is being sent. 

Connection

Yes

Indicates that the client wants to negotiate a change in the way the connection is being used. Value must be Upgrade.

Also returned by the server.

Sec-WebSocket-Version

Yes

The only accepted value is 13. Any other version passed in this header is invalid.

Sec-WebSocket-Key

Yes 

A base64-encoded one-time random value (nonce) sent by the client. Automatically handled for you by most WebSocket libraries or by using the WebSocket class provided in browsers. 

Sec-WebSocket-Accept

Yes 

A base64-encoded SHA-1 hashed value returned by the server as a direct response to Sec-WebSocket-Key

Indicates that the server is willing to initiate the WebSocket connection. 

Sec-WebSocket-Protocol

No

Optional header field, containing a list of values indicating which subprotocols the client wants to speak, ordered by preference. 

The server needs to include this field together with one of the selected subprotocol values (the first one it supports from the list) in the response.

Sec-WebSocket-Extensions

No

Optional header field, initially sent from the client to the server, and then subsequently sent from the server to the client. 

It helps the client and server agree on a set of protocol-level extensions to use for the duration of the connection.

Origin

No

Header field sent by all browser clients (optional for non-browser clients). 

Used to protect against unauthorized cross-origin use of a WebSocket server by scripts using the WebSocket API in a web browser.

The connection will be rejected if the Origin indicated is unacceptable to the server.

Sec-WebSocket-Key and Sec-WebSocket-Accept

It’s worth mentioning a few more details about two of the required headers used during the WebSocket handshake: Sec-WebSocket-Key and Sec-WebSocket-Accept. Together, these headers are essential in guaranteeing that both the server and the client are capable of communicating over WebSockets. 

First, we have Sec-WebSocket-Key, which is passed by the client to the server, and contains a 16-byte, base64-encoded one-time random value (nonce). Its purpose is to help ensure that the server does not accept connections from non-WebSocket clients (e.g., HTTP clients) that are being abused (or misconfigured) to send data to unsuspecting WebSocket servers. Here’s an example of Sec-WebSocket-Key:

Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==

In direct relation to Sec-WebSocket-Key, the server response includes a Sec-WebSocket-Accept header. This header contains a base64-encoded SHA-1 hashed value generated by concatenating the Sec-WebSocket-Key nonce sent by the client, and the static value (UUID) 258EAFA5-E914-47DA-95CA-C5AB0DC85B11.

Based on the Sec-WebSocket-Key example provided above, here’s the Sec-WebSocket-Accept header returned by the server:

Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
Copy link to clipboard

Establishing a connection at the WebSocket API level

The WebSocket API in browsers (and most WebSocket libraries) automatically handles the opening handshake for you. All you have to do is to instantiate the WebSocket object, which will automatically attempt to open the connection to the server:

const socket = new WebSocket('wss://example.org');

An open event is raised when a WebSocket connection is established. It indicates that the opening handshake between the client and the server was successful, and the WebSocket connection can now be used to send and receive data. Here’s an example (note that the open event is handled through the onopen property):

// Create WebSocket connection
const socket = new WebSocket('wss://example.org');


// Connection opened
socket.onopen = function(e) {
   console.log('Connection open!');
};
Copy link to clipboard

How to transmit data over WebSockets

Copy link to clipboard

Data transmission: Protocol-level considerations

After a successful opening handshake, the client and the server can use the WebSocket connection to exchange messages in full-duplex mode. A WebSocket message consists of one or more frames.

The WebSocket frame has a binary syntax and contains several pieces of information, as shown in the following figure:

Let’s quickly summarize them:

  • FIN bit - indicates whether the frame is the final fragment in a WebSocket message.

  • RSV 1, 2, 3 - reserved for WebSocket extensions.

  • Opcode - determines how to interpret the payload data.

  • Mask - indicates whether the payload is masked or not.

  • Masking key - key used to unmask the payload data.

  • (Extended) payload length - the length of the payload.

  • Payload data - consists of application and extension data.\

We will now take a more detailed look at all these constituent parts of a WebSocket frame.  

FIN bit and fragmentation

There are numerous scenarios where fragmenting a WebSocket message into multiple frames is required (or at least desirable). For example, fragmentation is often used to improve performance. Without fragmentation, an endpoint would have to buffer the entire message before sending it. With fragmentation, the endpoint can choose a reasonably sized buffer, and when that is full, send subsequent frames as a continuation. The receiving endpoint then assembles the frames to recreate the WebSocket message.    

Here’s what a single-frame message might look like:

0x81 0x05 0x48 0x65 0x6c 0x6c 0x6f (contains "Hello")

In comparison, with fragmentation, the same message would look like this:

0x01 0x03 0x48 0x65 0x6c (contains "Hel")
0x80 0x02 0x6c 0x6f (contains "lo")

The WebSocket protocol makes fragmentation possible via the first bit of the WebSocket frame — the FIN bit, which indicates whether the frame is the final fragment in a message. If it is, the FIN bit must be set to 1. Any other frame must have the FIN bit clear. 

RSV 1-3

RSV1, RSV2, and RSV3 are reserved bits. They must be 0 unless an extension was negotiated during the opening handshake that defines non-zero values.

Opcodes

Every frame has an opcode that determines how to interpret that frame’s payload data. The standard opcodes currently in use are defined by RFC 6455 and maintained by the Internet Assigned Numbers Authority (IANA)

Opcode

Description

0

Continuation frame; continues the payload from the previous frame.

1

Indicates a text frame (UTF-8 text data).

2

Indicates a binary frame.

3-7

Reserved for custom data frames.

8

Connection close frame; leads to the connection being terminated.

9

A ping frame. Serves as a heartbeat mechanism ensuring the connection is still alive. The receiver must respond with a pong frame. 

10

A pong frame. Serves as a heartbeat mechanism ensuring the connection is still alive. Sent as a response after receiving a ping frame.

11-15

Reserved for custom control frames.

Masking

Each WebSocket frame sent by the client to the server needs to be masked with the help of a random masking-key (32-bit value). This key is contained within the frame, and it’s used to obfuscate the payload data. However, when data flows the other way around, the server must not mask any frames it sends to the client. 

On the server-side, frames received from the client must be unmasked before further processing. Here’s an example of how you can do that:

var unmask = function(mask, buffer) {
   var payload = new Buffer(buffer.length);
   for (var i=0; i<buffer.length; i++) {
       payload[i] = mask[i % 4] ^ buffer[i];
   }
   return payload;
}

Payload length and payload data

The WebSocket protocol encodes the length of the payload data using a variable number of bytes:

  • For payloads <126 bytes, the length is packed into the first two frame header bytes. 

  • For payloads of 126 bytes, two extra header bytes are used to indicate length. 

  • If the payload is 127 bytes, eight additional header bytes are used to indicate its length. 

The WebSocket protocol supports two types of payload data: text (UTF-8 Unicode text) and binary

Copy link to clipboard

Data transmission with the WebSocket API

WebSocket programming follows an asynchronous, event-driven programming model. As long as a WebSocket connection is open, the client and the server simply listen for events in order to handle incoming data and changes in connection status (with no need for polling).

The message event is fired when data is received through a WebSocket. Messages might contain string (plain text) or binary data, and it's up to you how that data will be processed and visualized. 

Here’s an example of how to handle a message event (using the onmessage property):

socket.onmessage = function(msg) {
   if(msg.data instanceof ArrayBuffer) {
      processArrayBuffer(msg.data);
   } else {
      processText(msg.data);
   }
 }

To send messages via the WebSocket API you have to use the send() method, as demonstrated below:

socket.onopen = function(e) {
   socket.send(JSON.stringify({'msg': 'payload'}));
}

The sample code above shows how to send text (string) messages. However, in addition to strings, you can also send binary data (Blob or ArrayBuffer):

var buffer = new ArrayBuffer(128);
socket.send(buffer);


var intview = new Uint32Array(buffer);
socket.send(intview);


var blob = new Blob([buffer]);
socket.send(blob);
Copy link to clipboard

How to close WebSocket connections

Copy link to clipboard

Closing a WebSocket connection at the protocol level

The process of closing a WebSocket connection is known as the closing handshake. You initiate it by sending a close frame with an opcode of 8. In addition to the opcode, the close frame may contain a body that indicates the reason for closing. This body consists of a status code (integer) and a UTF-8 encoded string (the reason).

The standard status codes that can be used during the closing handshake are defined by RFC 6455, and listed in the following table:

Status code

Name

Description

0-999

N/A

Codes below 1000 are invalid and cannot be used.

1000

Normal closure

Indicates a normal closure, meaning that the purpose for which the WebSocket connection was established has been fulfilled.

1001

Going away

Should be used when closing the connection and there is no expectation that a follow-up connection will be attempted (e.g., server shutting down, or browser navigating away from the page). 

1002

Protocol error

The endpoint is terminating the connection due to a protocol error. 

1003

Unsupported data

The connection is being terminated because the endpoint received data of a type it cannot handle (e.g., a text-only endpoint receiving binary data).

1004

Reserved

Reserved. A meaning might be defined in the future. 

1005

No status received

Used by apps and the WebSocket API to indicate that no status code was received, although one was expected.

1006

Abnormal closure

Used by apps and the WebSocket API to indicate that a connection was closed abnormally (e.g., without sending or receiving a close frame). 

1007

Invalid payload data

The endpoint is terminating the connection because it received a message containing inconsistent data (e.g., non-UTF-8 data within a text message).

1008

Policy violation

The endpoint is terminating the connection because it received a message that violates its policy. This is a generic status code; it should be used when other status codes are not suitable, or if there is a need to hide specific details about the policy.

1009

Message too big

The endpoint is terminating the connection due to receiving a data frame that is too large to process.

1010

Mandatory extension

The client is terminating the connection because the server failed to negotiate an extension during the opening handshake.

1011

Internal error

The server is terminating the connection because it encountered an unexpected condition that prevented it from fulfilling the request.

1012

Service restart

The server is terminating the connection because it is restarting. 

1013

Try again later

The server is terminating the connection due to a temporary condition, e.g., it is overloaded. 

1014

Bad gateway

The server was acting as a gateway or proxy and received an invalid response from the upstream server. Similar to 502 Bad Gateway HTTP status code.

1015

TLS handshake

Reserved. Indicates that the connection was closed due to a failure to perform a TLS handshake (e.g., the server certificate can’t be verified). 

1016-1999

N/A

Reserved for future use by the WebSocket standard.

2000-2999

N/A

Reserved for future use by WebSocket extensions.

3000-3999

N/A

Reserved for use by libraries, frameworks, and applications. Available for registration at IANA on a first-come, first-serve basis.

4000-4999

N/A

Range reserved for private use in applications. 

Both the client and the web server can initiate the closing handshake. Upon receiving a close frame, an endpoint (client or server) has to send a close frame as a response (echoing the status code received). After an endpoint has both sent and received a close frame, the closing handshake is complete, and the WebSocket connection is considered closed.

Copy link to clipboard

Closing a WebSocket connection with the WebSocket API 

The close() method is used to close the WebSocket connection. After this method is called, no more data can be sent or received over the WebSocket connection. 

Here’s the most basic example of calling the close() method:

socket.close();

Optionally, you can pass two arguments with the close() method:

  • code. A numeric value indicating the status code explaining why the connection is being closed. See the Status codes table in the previous section of this article for details.

  • reason.  A human-readable string explaining why the connection is closing. 

Here’s an example of calling the close() method with the two optional parameters:

socket.close(1003, "Unsupported data type!");

A close event fires when the WebSocket connection closes. This is how you listen for a close event:

socket.onclose = function(e) {
   console.log("Connection closed", e);
};
Copy link to clipboard

What are the pros and cons of WebSockets?

The advantage of WebSockets is that they enable realtime communication between the client and server without the need for frequent HTTP requests/responses. This brings benefits such as reduced latency, and improved performance and responsiveness of web apps. 

Due to its persistent and bidirectional nature, the WebSocket protocol is more flexible than HTTP when building realtime apps that require frequent data exchanges. WebSockets are also more efficient, as they allow data to be transmitted without the need for repetitive HTTP headers and handshakes. This can reduce bandwidth usage and server load.

While WebSockets have plenty of advantages, they also come with some disadvantages. Here are the main ones:

  • WebSockets are not optimized for streaming audio and video data.

  • WebSockets don’t automatically recover when connections are terminated.

  • Some environments (such as corporate networks with proxy servers) will block WebSocket connections.

  • WebSockets are stateful, which makes them hard to use in large-scale systems.

Some of these limitations can be overcome by using a WebSocket-based platform such as Ably which has in-built logic to provide a no-scale ceiling, and handle automatic reconnection.

Learn more about the pros and cons of WebSockets

Copy link to clipboard

What are the best alternatives to WebSockets? 

WebSocket is an excellent choice for use cases where it’s critical (or at least desirable) for data to be sent and consumed in realtime or near-realtime. However, there is rarely a one-size-fits-all protocol — different protocols serve different purposes better than others. Realtime alternatives to WebSockets include:

Learn more about WebSocket alternatives

Copy link to clipboard

How to start building realtime experiences with WebSockets

Getting started with WebSockets is straightforward. The WebSocket API is trivial to use, and there are numerous WebSocket libraries and frameworks available in every programming language. Most of them are built on top of the raw WebSocket protocol, while providing additional capabilities — thus making it easier and more convenient for developers to implement WebSockets into their apps and build WebSocket-based functionality. 

If you’re just getting started with WebSockets and you’re looking to build your first realtime app powered by WebSockets, check out this step-by-step tutorial. It teaches you how to develop an interactive cursor position-sharing demo using two simple open-source WebSocket libraries. It’s the kind of project that requires bidirectional, instant communication between client and server — the type of use case where the WebSocket technology truly shines.

Or enjoy this video which teaches you how to use WebSockets with React and Node in just 60 minutes.

On the other hand, shipping production-ready realtime functionality powered by open-source WebSocket libraries is not at easy as building a simple demo app. It’s a path riddled with obstacles and engineering complexities. See, for example, the many engineering challenges involved in scaling Socket.IO, one of the most popular open-source WebSocket libraries out there. 

You can also get a deeper understanding of the challenges of scaling WebSockets in this video:

If you want to avoid the challenges and costs of scaling and maintaining WebSocket infrastructure in-house, you can offload this complexity to a managed third-party PaaS such as Ably.

Copy link to clipboard

Ably, the WebSocket platform that works reliably at any scale

Ably is a realtime experience infrastructure provider. Our APIs and SDKs help developers build and deliver realtime experiences without having to worry about maintaining and scaling messy WebSocket infrastructure. 

Key Ably features and capabilities:

  • Pub/sub messaging over serverless WebSockets, with rich features such as message delta compression, automatic reconnections with continuity, user presence, message history, and message interactions.

  • A globally-distributed network of datacenters and edge acceleration points-of-presence. 

  • Guaranteed message ordering and delivery. 

  • Global fault tolerance and a 99.999% uptime SLA.

  • < 50ms round-trip latency (P99).

  • Dynamic elasticity, so we can quickly scale to handle any demand (billions of WebSocket messages sent to millions of pub/sub channels and WebSocket connections). 

Copy link to clipboard

WebSocket FAQs 

Copy link to clipboard

What is a WebSocket connection?

You can think of a WebSocket connection as a long-lived, bidirectional, full-duplex communication channel between a web client and a web server. Note that WebSocket connections work on top of TCP. 

Copy link to clipboard

Are WebSockets scalable?

Yes, WebSockets are scalable. Companies like Slack, Netflix, and Uber use WebSockets to power realtime features in their apps for millions of end-users. For example, Slack uses WebSockets for instant messaging between chat users

However, scaling WebSockets is non-trivial, and involves numerous engineering decisions and technical trade-offs. Among them:

  • Should you use vertical or horizontal scaling?

  • How do you deal with unpredictable loads?

  • How do you manage WebSocket connections at scale?

  • How much bandwidth is being used overall, and how is it impacting your budget?

  • Do you have to deal with traffic spikes, and if so, what is the performance impact on the server layer?

  • How will you automatically add additional server capacity if and when it’s needed?

  • How do you ensure data integrity (guaranteed message ordering and delivery) at scale?

Learn more about the challenges of scaling WebSockets

Copy link to clipboard

Are WebSockets secure?

WebSockets can be secure if they are implemented with appropriate security measures. Secure WebSocket connections use the "wss://" URI. This indicates that the connection is encrypted with SSL/TLS, which ensures that the data transmitted between the WebSocket client and WebSocket server is encrypted and cannot be intercepted or tampered with by third parties. 

Additionally, WebSocket connections can be subject to the same security policies as HTTP connections, such as cross-origin resource sharing (CORS) restrictions, which prevent unauthorized access to resources across different domains.

Note that the WebSocket protocol doesn’t prescribe any particular way for servers to authenticate clients. For example, you can handle authentication during the opening handshake, by using cookie headers. Another option is to manage authentication (and authorization) at the application level, by using techniques such as JSON Web Tokens.

Learn more about common WebSocket security vulnerabilities - and how to prevent them

Copy link to clipboard

Are WebSockets faster than HTTP? 

In the context of realtime apps that require frequent data exchanges, WebSockets are faster than HTTP. 

HTTP connections have additional overhead, such as headers and other metadata, that can add latency and reduce performance compared to WebSocket connections, which are designed for persistent, low-latency, bidirectional communication. With WebSockets, there’s no need for multiple HTTP requests and responses. This can result in faster communication and lower latency. 

Learn more about the differences between WebSockets and HTTP

Copy link to clipboard

Are WebSockets synchronous or asynchronous?

WebSockets are asynchronous by design, meaning that data can be sent and received at any time, without blocking or waiting for a response. However, it's important to note that while WebSockets themselves are asynchronous, the code used to handle WebSocket events and messages may be synchronous or asynchronous, depending on how it’s written.

Copy link to clipboard

Are WebSockets expensive?

A WebSocket connection is not inherently expensive, as it's designed to be lightweight and efficient, with minimal overhead. That being said,  building and managing a scalable and reliable WebSocket system in-house is expensive, time-consuming, and requires significant engineering effort:

  • 10.2 person-months is the average time to build basic WebSocket infrastructure, with limited scalability, in-house.

  • Half of all self-built WebSocket solutions require $100K-$200K a year in upkeep.

Learn more about what it costs to build WebSocket infrastructure in-house

Copy link to clipboard

What browsers support WebSockets?

WebSockets are supported by most modern web browsers, including:

  • Google Chrome (version 4 and later).

  • Mozilla Firefox (version 4 and later).

  • Safari (version 5 and later).

  • Microsoft Edge (version 12 and later).

  • Opera (version 10.70 and later).

  • Internet Explorer (version 10 and later).

  • Microsoft Edge (version 12 and later). 

Note that older versions of these browsers either don’t support WebSockets, or have limited support. At the time of writing (25th of April 2023), Opera Mini is the only modern browser that doesn’t support WebSockets. 

Copy link to clipboard

How long can a WebSocket stay open?

In general, WebSocket connections can stay open indefinitely as long as both the client and server remain connected and the network is stable. 

Copy link to clipboard

Are WebSockets stateful or stateless?

Unlike HTTP, a WebSocket connection is persistent and stateful. This makes WebSockets hard to use in large-scale systems that consist of multiple WebSocket servers (you need to share connection state across servers). 

Read about the challenges of scaling WebSockets

Join the Ably newsletter today

1000s of industry pioneers trust Ably for monthly insights on the realtime data economy.
Enter your email