Chat and instant messaging are a key part of how many applications are driving product adoption, user engagement, and retention. As a product owner or developer, you’re almost spoiled for choice when it comes to deciding how to implement chat in your product.
In this article, we’re going to go deep and look at the protocols that power chat. We’ll start with an overview of the landscape of web chat protocols, then a review of eight of the most common protocols, followed by how to get started with instant messaging and chat.
Instant messaging and chat protocols: An overview
One challenge of reviewing instant messaging and chat protocols is that many of them solve different problems or different parts of the problem. At its most basic, a chat protocol needs to deliver messages between participants. But there are two dimensions that separate one protocol from another:
- Which parts of the problem that protocol tackles.
- How thoroughly it tackles those parts of the problem.
The first is easy. Some protocols aim to take care of everything and provide an end-to-end solution, while others have a laser-like focus on solving just one aspect.
The second takes a little more consideration. For example, IRC is a protocol that appears to cover everything but that also lacks many of the modern conveniences that users expect.
One useful way to think about this is in comparison with the OSI seven-layer networking model. Here’s a quick refresher: implementing a computer network requires protocols that operate at each level from the fibre or copper of the physical layer right up to the application layer.
TCP and UDP, for example, are two protocols that operate at the transport layer, each with different trade-offs. TCP is slower but promises that data will arrive in the right order, whereas UDP is faster and makes no such guarantees. Chat and instant messaging protocols offer similar compromises, depending on your use case.
Eight instant messaging and chat protocols to consider
Choosing the right protocol, or protocols, is one of the first architectural decisions you’ll take when planning out a chat or instant messaging service.
When considering a protocol, it’s useful to consider:
- How clients connect with each other: Client-server or peer to peer.
- The types of data it can support: Whether it supports text only or streaming audio and video as well.
- Reliability and latency: The measures the protocol takes to ensure delivery and how that impacts the speed with which messages are exchanged.
- Scalability: How well the protocol tackles increased data flows, clients, and so on.
- Security: Whether the protocol has built-in security, such as encrypting messages in transit, or requires that you bring in other tools to handle security.
Let’s look at some of the most popular protocols for instant messaging.
WebSocket is an event-driven, low latency streaming data protocol that is ideal for streaming data in realtime. That makes it suited to chat and instant messaging because it is designed from the ground up to support ongoing, two way connections between servers and clients. However, it’s worth noting that WebSocket is not in itself a chat protocol. Instead, it is a realtime messaging protocol on top of which you can build chat.
So, how does it shape up?
Pros of WebSocket
- Bidirectional and full-duplex: Ideal for chat, messages use a single connection to go in both directions, and at the same time, between client and server.
- Efficient use of server resources: WebSocket’s persistent connections avoid the overhead of repeated HTTP request/response cycles.
- Low latency: As an event driven protocol, WebSocket pushes data to the client or server at the moment it is available. Unlike HTTP which needs headers and other metadata, once a connection is established, WebSocket sends only the data itself.
- Widespread: All major browsers and various libraries have built-in support for WebSocket, which makes uptake easier for your end users.
Cons of WebSocket
- Less suited to audio and video: While it is possible to stream audio and video using a WebSocket connection, the high bandwidth of those data types makes them better suited to protocols that establish multiple concurrent connections. Similarly, while WebSocket offers a good trade-off between latency and delivery guarantees, streaming video and audio can usually stand to drop some data, meaning that the lowest latency rich media protocols offer lower latencies in exchange for fewer guarantees that individual packets of data will reach the client.
- No auto-reconnection: If a WebSocket connection drops, you’ll need to reestablish it manually. That means writing code that monitors the connections and initiates the reconnection if needed.
- Somewhat harder to scale horizontally: WebSockets maintain connection state, which can make them challenging to use in large-scale systems with multiple WebSocket servers. Maintaining connection state would require synchronizing each connection’s data across all of the servers that could potentially serve that connection.
While WebSocket is built around a client-server model, WebRTC relies on peer to peer connections. This can make it better suited to workloads where introducing a server would be unnecessary. For example, file transfer or ad-hoc video calling between two people who are already connected in some other form, such as taking part in an online chat together.
What are the pros and cons of WebRTC?
Pros of WebRTC
- Well suited to video and audio content: Unlike WebSocket, WebRTC is designed specifically for streaming rich media.
- Increased resilience in some situations: In a client-server model, a problem on the server could disrupt data transfer to multiple clients. As WebRTC connections are peer to peer, there isn’t a central point where a failure could impact multiple data streams in the same way. However, many WebRTC services do rely on a centralized back-end for directory and other services.
- Lighter weight infrastructure: Without the need to architect, deploy, and maintain a central server, it can be easier to build applications that depend on WebRTC.
- Adaptive approach to latency: WebRTC can switch between TCP (for reliable delivery) and UDP (for lower latencies and greater concurrent volumes of data) depending on what’s more important at that moment.
Cons of WebRTC
- Designed to drop data where necessary: The flip side of lower latency is that WebRTC is more concerned about velocity of data than it is data integrity. That’s fine for streaming video and audio but less suitable for text or JSON data.
MQTT is a lightweight protocol designed primarily for machine-to-machine communication and especially in situations where there’s low bandwidth and low compute power. Clients subscribe to channels that are managed by a central message broker. When a client publishes a message it sends it up to the broker, which then distributes it to the subscribers.
Typically, MQTT is found in internet of things (IoT) applications, such as sending new tariff information to smart utility meters.
However, those same characteristics that suit MQTT to IoT also lend themselves to chat and instant messaging. Facebook Messenger is one chat platform that uses MQTT and specifically because it puts relatively little strain on bandwidth and device battery.
But is it the right protocol for your chat app?
Pros of MQTT
- Lightweight: MQTT has very little overhead meaning that it can operate in resource constrained environments such as cellular networks with spotty coverage.
- Reliable, ordered delivery: With three levels of Quality of Service (QoS), MQTT offers at most once, at least once, and best effort delivery modes so you can adapt to difficult network conditions.
- Flow control: MQTT helps to prevent congestion and overwhelming the central message broker by limiting how many messages each client can send.
Cons of MQTT
- Message broker limits scaling: Relying on a central message broker complicates the scalability of MQTT and also introduces a single point of failure.
- Somewhat harder to develop with: As a bandwidth conscious protocol, MQTT limits developer conveniences. For example, there are five standard connection error messages, which can make debugging harder if the error falls outside those anticipated by the protocol designers.
- Unsuitable for video and audio: As it is built with low bandwidth use cases in mind, MQTT isn’t optimized for streaming audio or video.
- Not secure by default: Credentials are passed in plain text so you’ll need a secondary security mechanism, such as SSL, on top of MQTT.
Originating in the 1990s, XMPP is an XML-based protocol built to power open source instant messaging tools. Today, it is an IETF standard and a slimmed down version, called FunXMPP, is the protocol that underlies WhatsApp.
XMPP uses a decentralized client-server architecture. Each XMPP server can connect to others on a peer-to-peer basis, increasing both resilience and scalability. Perhaps unusually, though, XMPP does not specify the transport layer.
As a dedicated instant messaging protocol, what are the advantages and disadvantages of XMPP?
Pros of XMPP
- Resilience and scalability: With a decentralized architecture, it is easier to scale XMPP based chat and instant messaging apps, while also being hardened against individual server failure.
- Transport flexibility: XMPP is transport agnostic, meaning that it can route around firewalls and other restrictions. Typically, XMPP runs on TCP but can also use HTTP, WebSocket, and other delivery mechanisms.
- Built-in security: The protocol offers several ways to secure communication, such as encrypting connections using Transport Layer Security (TLS) and optional message signing.
Cons of XMPP
- Complexity: XMPP’s flexibility, such as in how data is transported, and its extensibility mean that building it into your chat app can take more development effort.
- XML-based: XMPP uses XML to structure data. More modern chat protocols are JSON based, making them potentially a better fit for existing tooling.
- Less efficient: XMPP is a relatively heavyweight protocol, meaning that the same data can take more bandwidth and suffer from longer latencies than protocols such as WebSocket.
The Advanced Message Queuing Protocol (AMQP) is a publish-subscribe queuing-based messaging protocol. It is multi-layered, which means it takes care of both the transport layer and messaging layer.
Like MQTT, AMQP relies on a central message broker. However, with its origins in financial trading, AMQP differs from MQTT in that it is designed for high throughput in time sensitive applications.
So, is AMQP a good choice for chat? Let’s look at the pros and cons.
Pros of AMQP
- Scalability: AMQP can scale horizontally both by adding more message brokers (servers). However, creating a cluster of AMQP message brokers requires additional software such as RabbitMQ.
- Built-in security: AMQP takes care of both connection security and message encryption.
- Extensibility: The AMQP standard allows for API, codec, and security extensions. The AMQP Interoperability Technical Committee reviews and approves extensions, meaning that the standard can adapt to new needs without fragmentation.
Cons of AMQP
- Complexity: As a protocol designed for high throughput and flexibility, AMQP can be harder to implement than alternatives.
- Not suited to realtime use cases: AMQP’s focus is on delivering messages reliably rather than in realtime. This can make it a poor choice for chat applications.
Matrix is a federated protocol originally designed for use cases such as chat, IoT, and voice calling. In theory, Matrix could be used for any realtime messaging. Using bridges to other services, Matrix can also interact with other tools such as Slack, Skype, and WeChat.
Open source chat tool Rocket.chat uses Matrix. But is it a good choice for your chat application?
Pros of Matrix
- Decentralization and federation: Matrix takes a fairly radical approach to scalability and resilience. Rather than relying on a single, central server or cluster, they create a network of federated instances. That’s good for the scalability and durability of the broader network but it does come with some tradeoffs, as we’ll see in a moment.
- Extensibility: Matrix can connect to other networks and tools through its bridging concept.
Cons of Matrix
- Harder to scale: While decentralization can help the bigger network to grow, it also brings scaling challenges. Peer-to-peer federation complicates the distribution of messages across the network, especially as the number of concurrent users grows. That can make it hard for Matrix based chat applications to maintain realtime delivery of messages.
- Lack of standardization: With multiple, independent Matrix client implementations, there is the potential for inconsistencies in how the protocol operates from one to the next. Learning curve: For end users, federation introduces complexity into learning how to use the system. For example, chat rooms and users are spread across multiple servers, which makes discovery harder.
7. Server Sent Events
Before WebSocket, Server Sent Events was an early response to the problem of the web being stateless. At the time, updating the information on a web page would require refreshing that page or using a technique such as a long polling, both of which are somewhat inefficient. Server Sent Events sits on top of HTTP to open an ongoing, one-way connection between the client (usually a web browser) and the server. When the server has new data for the client, it pushes it over the connection.
As a one-way protocol, Server Sent Events isn’t well suited to chat and instant messaging applications. But let’s look in more detail at the advantages and disadvantages.
Pros of Server Sent Events
- HTTP based: Being HTTP based makes it less likely that Server Sent Events connections will be blocked by firewalls. It also simplifies the implementation, as the underlying infrastructure on both the client and the server is already in place.
- Focused on doing one thing: Server Sent Events is a good choice where you need a simple, lightweight protocol that will only ever push relatively simple data from a server to the client.
- Supports reconnection: If the connection drops, Server Sent Events automatically reestablishes it. That saves development effort, compared to WebSocket where you need to monitor for disconnections and manually reconnect.
Cons of Server Sent Events
- One-way communication: The deal breaker for chat applications is that Server Sent Events is a one-way protocol. It could be suited to broadcast-only messages but there is no way to reply using Server Sent Events itself.
- Text only: Server Sent Events supports UTF-8 messages only, meaning there’s no way to send rich content such as images or audio.
Dating back to the late 1980s, IRC has played an important role in the evolution of open source software communities, with networks each dedicated to different themes or localities. However, IRC has changed little over the decades and looks outdated in comparison to modern alternatives such as WebSocket, XMPP, and Matrix.
Pros of IRC
- Tried and tested: IRC has been in daily use for more than three decades, meaning that many bugs have been ironed out.
- Widespread: Despite no longer being as popular as it once was, IRC clients are available for almost every platform, even including 8-bit home computers such as the Sinclair ZX Spectrum.
- Simple: IRC is focused on enabling realtime, text-based chat, meaning that the protocol itself is relatively simple.
Cons of IRC
- Ephemerality: To receive messages, an individual user must be connected to the relevant IRC server. Any messages sent while someone is disconnected are effectively lost.
- Text-only: IRC has no support for images, file sharing, or other rich media.Lack of security: IRC dates from a time when the internet was used mostly as a tool for academia. As such, it has minimal security features.
How to choose the right chat protocol for your business
With so much variety, how do you select the right chat protocol for your specific business needs? Here are five areas you’ll need to consider.
Where it sits in the architecture
Each of the protocols we’ve looked at focuses on a specific area of the problem space and then takes its own approach to the solution. For example, IRC is an end to end chat solution, albeit rather lacking by today’s standards. WebSocket, on the other hand, provides the underlying infrastructure to build your own chat application on top.
Read our blog for more on chat app architecture.
Architecting for scalability will prepare your chat application for growth. The protocols we’ve considered fall into three broad scaling categories:
- Limited scalability: Some aspects of a protocol fundamentally make scaling harder. For example, it can be harder to scale MQTT’s centralized message broker.
- Horizontal scaling: Adding more server instances to work alongside each other enables you to increase the capacity of your chat platform. XMPP and WebSocket both enable you to scale this way.
- Federated: Individual instances of the chat protocol exist on separate servers but they communicate with each other to exchange messages. This is how Matrix works.
Message delivery guarantees, fault tolerance, and recovery from errors differ greatly from one protocol to the next. MQTT, for example, is designed to handle constrained network conditions and so gives you control over what level of message delivery guarantees versus additional network overhead are appropriate. The increased complexity of a federated protocol, such as Matrix, can make it harder to build a reliable solution.
Security and privacy
Some protocols come with several layers of security and privacy, while others such as IRC have virtually none. The first question is, “What does the protocol itself handle?” Here are some of the security and privacy factors that you’ll need to consider:
- Connection encryption: Does the protocol encrypt the connection between client and server (or peers) itself, as with AMQP? Or is that left to the transport layer, as in the case of XMPP, or another third-party tool?
- Data in transit: On top of transport layer encryption, does the protocol encrypt messages at the application layer, too? That adds an additional layer of protection against snooping.
- Data at rest: What happens to messages on the client and the server? Are they stored in the clear or are they encrypted?
- User authentication: How well is user authentication handled? Is it left entirely to trust, as in the case of IRC, or will it integrate with existing SSO providers?
- Response to vulnerabilities: Do the maintainers of the protocol and its implementations fix vulnerabilities in a timely manner? Or could the protocol itself become an attack vector?
Initial and ongoing cost
An all-in-one protocol might have a lower cost of initial implementation but lack of flexibility could mean that customizations are harder and so more costly. Similarly, each protocol makes different demands on the server-side, the network, and the clients. A heavier weight protocol might need more network bandwidth, for example, which could increase the costs of scaling as time goes on.
Maintaining a solution based on different protocols will also have different impacts on your staffing, licensing, and cloud costs. A protocol that can easily use a platform as a service, such as WebSocket on Ably, will save you both initial development cost and ongoing maintenance costs.
Getting started with instant messaging and chat
Choosing the protocol that powers your chat and instant messaging application opens up another question: how much of the solution do you build in-house? There are four main options:
- Build entirely in-house: Building your own solution can be tempting. If you’re responsible for everything then it seems as though you can build precisely what you need. The experience of many engineering teams is that building in-house leads to more compromise as more time must go towards building and maintaining foundational technologies, rather than innovation. Similarly, long-term costs can escalate quickly as maintenance, security responses, and feature development each make demands on limited resources.
- Use a platform as a service (PaaS): Rather than building everything in-house, using a realtime PaaS, such as Ably’s, gives you the tooling and infrastructure to create your own solution but without the additional burden of creating and maintaining every part of the tech stack. That gives you back the engineering resources to innovate where it matters.
- Use an open source chat system: There are open source servers and clients for protocols such as IRC and Matrix that you can deploy and configure to your needs. However, you remain bound by the feature roadmap, engineering prowess, and security responsiveness of the developer communities behind those projects. Depending on your needs, customization could be costly or even entirely uneconomic.
- Use a chat as a service: Similarly, there are commercial services that provide a white label chat service. Customization options vary but, at the least, you can add your own branding and integrate with existing services to some extent. While this is one way to get up and running quickly, one downside is that you’re committed to the pricing and feature roadmap of the vendor you choose.
Whichever option you go for, the choice you make and the underlying protocol(s) that power your chat or instant messaging will have a material impact on the end user experience, your ability to deliver new features, and the ongoing cost of maintenance.
For more detail on the impact of each of these options, take a look at our article on the pros and cons of building your own solution versus other options such as using a realtime PaaS.
Upgrade your chat experience with Ably
Building your chat or instant messaging application with Ably means that you can choose the protocol or protocols that best suit your needs and trust the infrastructure to our expert team.
Whether you settle on WebSocket, MQTT, AMQP, Server Sent Events, Ably can help you find the happy medium between building in-house and outsourcing. Our global edge network means that we can offer guarantees and SLAs across performance, data integrity, reliability, and availability. In particular, building with Ably means your application benefits from:
- Predictable performance: Chat users will enjoy a realtime experience, thanks to Ably’s low-latency and high-throughout global edge network, with median latencies of <65ms.
- Guaranteed ordering and delivery: Messages are delivered in order and exactly once, even after disconnections.
- Fault tolerant infrastructure: Redundancy at regional and global levels with 99.999% uptime SLAs.
- High scalability and availability: Built and battle-tested to handle millions of concurrent connections at effortless scale.
- Optimized build times and costs: Deployments typically see a 21x lower cost and upwards of $1M saved in the first year.
Start building with Ably by signing up for your free developer account.