Guide: Export chat data to your own systems
Ably Chat is a simple and easy-to-use realtime chat solution that handles any scale from 1:1 and small group chats to large livestream chats with millions of users.
Ably holds data for the purpose of providing realtime experiences. While Ably Chat provides flexible data retention for messages (30 days by default, up to a year on request), this guide discusses options for longer-term storage or additional control over data.
This guide shows you how to export data from Ably Chat to your own systems. Before diving into the technical implementation, it's important to understand your architectural goals and what role your database will play.
Why export chat data?
Exporting chat data from Ably Chat to your own systems can be beneficial for many use cases, for example:
- Compliance and legal requirements: Meet data retention policies, maintain audit trails for support conversations, or fulfill regulatory requirements.
- Analytics and business intelligence: Build dashboards, train ML models, analyze customer sentiment, or track support quality metrics.
- Enhanced functionality: Implement features that need the chat history, such as search.
- Single source of truth: Maintain your own database as the canonical source of truth.
- Long-term storage: Store chat data for longer than Ably Chat supports.
Key considerations
Consider the following when exporting chat data:
- Database schema: Design your schema to allow you to easily build the features you need, keeping in mind scale and reliability.
- Version history requirements: Decide whether you need to store all versions of messages or just the latest version (see Decision 1 below).
- Concurrent writes: New messages, updates, and deletes will arrive concurrently, so your database system must handle this. Depending on your database, consider reducing roundtrips, managing locks, and handling race conditions.
- Scale and reliability trade-offs: Depending on the scale of your application, you need to consider how you will scale up and down the parts that handle the ingestion of messages from Ably.
- Data latency and consistency: When publishing, there will be a small delay between a message being published and it arriving in your database via integrations. If you need your database to be the source of truth consider publishing via your own servers.
- Handling messages beyond retention period: Consider how to retrieve and display messages that are older than Ably Chat's retention period, especially when they need to appear in a chat window alongside newer messages.
Implementation options
With your strategy in mind, choose the technical approach that fits your needs:
- Using outbound webhooks. HTTP endpoint, AWS Lambda, and others.
- Using outbound streaming. Stream to your own Kafka, Kinesis, and others.
- Using an Ably queue.
- Publishing via your own servers.
- Using the Chat History endpoint.
Decoding and storing messages
Regardless of the delivery mechanism, you will need to decode the received messages into Chat messages. Details of the mapping from Ably Pub/Sub messages to Chat messages are available in the chat integrations documentation.
After performing the decoding to get your chat Message object, you can proceed to save it to your own database.
There are two decisions to make when saving messages.
Decision 1: Full version history or just the latest version?
Do you need all versions of a message or just the latest version?
- Messages are uniquely identified by their
serial. Message versions are identified by the message'sversion.serialproperty. - Lexicographically higher
version.serialmeans a newer version. - If you need to store all versions of a message, uniquely index by
roomName,serialandversion.serial. - If you only need the latest version of a message, uniquely index by
roomNameandserial, and only update if the receivedversion.serialis greater than what you have stored. This handles out-of-order delivery. - When performing a message search or lookup, do you want to return only the latest version of each message, even if you store the full version history?
- If you are looking to hydrate chat windows from your own database, think of how to efficiently retrieve the latest version of each message for a time window. For example, this can be implemented via a separate table or by iterating through all versions and filtering old versions out.
1
2
3
4
5
6
7
8
9
10
const saveMessageVersion = (roomName, message) => {
if (message.action === 'message.summary') {
// summary events are not part of the message version history, so discard
return;
}
// Pseudo-code: only insert if you don't already have this message version
// Implementation depends on your database's upsert/conflict handling capabilities
await insertIfNotExists(roomName, message.serial, message.version.serial, message);
};Read more about message versioning and sorting in the messages documentation.
Decision 2: How to store message reactions?
If you need to store message reactions you need to consider the following:
- Do you need to store only the current state of the reactions, historic snapshots of the current state, or the full history of all individual reactions?
- If you only need the current state (latest summary), simply save the values provided in the latest message with action
message.summary. Uniquely index byroomNameandserial. - If you need to store historic snapshots, store all
message.summaryevents for every message. Note that when a message receives many reactions in a short amount of time, summaries can be rolled up for cost and bandwidth optimisation, so not every reaction gets a summary event published.
- If you only need the current state (latest summary), simply save the values provided in the latest message with action
- Do you have a requirement to store the list of clientIds who reacted to a message, or just the totals?
- If you only need the totals, simply use the values provided in each message with action
message.summary. - If you need the list of clientIds who reacted, use the values from reaction summaries.
- If you only need the totals, simply use the values provided in each message with action
If you do not need to store message reactions, you can simply discard them. Never store the reactions (or annotations) field and ignore messages with action message.summary.
Filtering rooms and event types
Integrations allow you to filter which Ably channels are forwarded to your own system using a regular expression on the channel name. This is a simple way to reduce the volume of messages you need to process by only receiving messages from the chat rooms you are interested in. Use a common prefix in the name of chat rooms that you want to trigger an integration for, and use the prefix as the filter.
Use channel.message as the event type for all integration types. This will forward all messages published to the relevant channels and exclude presence messages and channel lifecycle messages.
Select enveloped messages when setting up your integrations to receive all the metadata about the message, including the serial, version, and extras (which include the headers of a chat message).
Using a webhook
Ably can forward messages to your own system via a webhook. This is the simplest to set up if you don't already have other systems in place for message ingestion. This section covers the simple HTTP endpoint webhook, but the same principles apply to other webhook integrations such as AWS Lambda, Azure Function, Google Function, and others.
Read the guide on outbound webhooks for more details on how to set up a webhook with Ably for the platform of your choice.
You need to consider:
- Redundancy: In case of failure, Ably will retry delivering the message to your webhook, but only for a short period. You can see errors in the
[meta]logchannel. - Ordering: Messages can arrive out-of-order. You can sort them using their
serialandversion.serialproperties. - Consistency: Webhook calls that fail will lead to inconsistencies between your database and Ably, which can be difficult to resolve. Detect if this happens using the
[meta]logchannel and use the history endpoint to backfill missing data. - At-least-once delivery: You need to handle duplicate messages. Deduplication can be done by checking
serialandversion.serial.
Using outbound streaming
Ably can stream messages directly to your own queueing or streaming service: Kinesis, Kafka, AMQP, SQS. Read the guide on outbound streaming for more details on how to set up the streaming integration with Ably for the service of your choice.
Benefits:
- Use your existing queue system to process and save messages from Ably.
- You control your own queue system, so you have full control over message ingestion from queue to database in terms of retry strategies, retention policies, queue lengths, and so on.
You need to consider:
- You need to maintain and be responsible for a reliable queue system. If you don't already have such a system, it increases complexity on your end.
- Consistency. If your queue system is not reachable, you will lose messages. Errors can be seen in the
[meta]logchannel.
Using an Ably queue
Ably can forward messages from chat room channels to an Ably Queue, which you can then consume from your own servers to save messages to your own database. Read the guide on Ably queues for more details on how to set up the queue integration with Ably.
Ably ensures that each message is delivered to only one consumer even if multiple consumers are connected.
Benefits of using an Ably queue:
- You can consume it from your servers, meaning overall this is fault-tolerant. Ably takes care of the complexity of maintaining a queue.
- You can use multiple queues and configure which channels go to which queue via regex filters on the channel name.
- Fault-tolerant: if your systems suffer any temporary downtime, you will not miss messages, up to the queue max size. There is a deadletter queue to handle the situation where messages are dropped from the Ably Queue.
You need to consider:
- During peak times you may need to scale up your consumers to avoid overloading the queue past the maximum queue length allowed.
- Each message has a time-to-live in the queue. The default and maximum is 60 minutes.
- Oldest messages are dropped if the maximum queue length is exceeded. Check the dead letter queue to see if this is happening.
- Always consume messages from the dead letter queue to monitor errors.
Publishing via your own servers
Change the publish path: instead of publishing Chat messages, updates, and deletes to Ably directly, proxy them through your own server. This gives you the opportunity to also save the messages as they are produced, and also apply different validation schemes if needed.
Benefits:
- Full control over publishing.
- Opportunity to add extra validation before publishing to Ably.
- Opportunity to add extra processing or business logic before publishing to Ably.
- You can publish messages directly via the Chat REST API, and avoid having to encode/decode Chat Messages to and from Ably Pub/Sub messages.
- If you are using a supported language, you have the option to publish via the Chat SDK.
You need to consider:
- You need to handle updates and deletes on your own, including all consistency issues that arise from this.
- Storing message reactions will require using one of the other methods presented in this guide, otherwise you will not have access to the aggregates (summaries) that Ably provides.
- Your own servers are in the middle of the message publish path, so they can become a bottleneck in availability and will add latency in the publish path.
- Your own servers will need to handle the scale you operate at for realtime publishes.
- Keeping both systems in sync can be a difficult problem to solve in all edge cases. Inconsistencies can happen if either publishing to Ably or saving to your own database fails. You will need mitigation strategies to handle all failure scenarios to ensure eventual consistency.
Using the Chat History endpoint
You can fetch the message history of a chat room using the Chat History endpoint or the Chat SDK. The chat room history endpoint is a paginated HTTP endpoint that allows you to retrieve messages from a chat room. The Chat SDK provides a convenient way to fetch the history of a chat room.
If your use case is to archive chats that have ended, such as to export the chat history of a support ticket that is closed, you can use the chat history endpoint to export the messages to your own system. Read the docs on chat history for more details.
The intended use of the chat history endpoint is to retrieve messages for pre-filling a chat window, not for continuous ingestion into other systems. As a result, there are some important things to consider:
- The history endpoint is not a changelog, it is a snapshot of the messages in the room at the time the request is made. It returns only the latest version of each message.
- The history API returns messages in their canonical global order (sorted by
serial). - You will need to decide when and which rooms to import messages from. The metachannel
[meta]channel.lifecycleprovides events when channels are opened and closed, but your business logic might provide a better solution, for example import when a support ticket is closed or game session ends. - You can import the same room multiple times (deduplicate by
serialandversion.serial). However, to capture all updates and deletes, you will need to fetch from the first message each time, which can be impractical for long-running chats with large message histories.
For use cases where there is a clear start and end of the chat, exporting the chat via history requests is a simple, reliable solution. If there is no clear start and end for chats, if you require continuous ingestion, or if you need the full message version history, please consider using one of the other methods mentioned in this guide.