Connection state and recovery
Connections to Ably will transition through multiple states throughout their lifecycle. States can be observed and triggered using methods available on the connection object.
Although connection state is temporary, Ably provides continuity of message delivery between a client and the service provided that a dropped connection is re-established by the client within a limited interval (typically around two minutes). After that time the connection becomes stale and the system will not attempt to recover the connection state.
An Ably SDK is responsible for managing a connection. This includes:
- selecting a transport, when multiple transports are available
- selecting a new host to connect to when automatically falling back to an alternate datacenter if needed
- managing continuity of operation when the connection drops
When an SDK is instantiated it will establish a connection immediately, and if the connection drops at any time it will attempt to re-establish it by making repeated connection attempts every 15 seconds for up to two minutes.
Available connection states
The different connection states are:
- initialized
- A
Connection
object has been initialized but not yet connected. - connecting
- A connection attempt has been initiated, this state is entered as soon as an SDK has completed initialization, and is re-entered each time connection is re-attempted following disconnection.
- connected
- A connection exists and is active.
- disconnected
- A temporary failure condition when no current connection exists.The disconnected state is entered if an established connection is dropped, or if a connection attempt is unsuccessful. In the disconnected state an SDK will periodically attempt to open a new connection (approximately every 15 seconds), anticipating the connection will be re-established soon and connection and channel continuity will be possible. In this state, developers can continue to publish messages as they are automatically placed in a local queue, sent when connection is re-established. Messages published by other clients whilst this client is disconnected will be delivered to it when reconnected if the connection was resumed within two minutes. After two minutes have elapsed, recovery is no longer possible and the connection will move to the
suspended
state. - suspended
- A long term failure condition when no current connection exists because there is no network connectivity or available host.The suspended state is entered after a failed connection attempt if there has then been no connection for a period of two minutes. In the suspended state, an SDK will periodically attempt to open a new connection every 30 seconds. Developers are unable to publish messages in this state. A new connection attempt can also be triggered by an explicit call to
connect()
on theConnection
object.Once the connection has been re-established, channels will be automatically re-attached. The client has been disconnected for too long for them to resume from where they left off, so if it wants to catch up on messages published by other clients while it was disconnected, it needs to use the history API. - closing
- An explicit request by the developer to close the connection has been sent to the Ably service. If a reply is not received from Ably shortly, the connection will be forcibly terminated and the connection state will become
closed
. - closed
- The connection has been explicitly closed by the client.In the closed state, no reconnection attempts are made automatically by an SDK, and clients may not publish messages. No connection state is preserved by the service or by an SDK. A new connection attempt can be triggered by an explicit call to
connect()
on theConnection
object, which will result in a new connection. - failed
- This state is entered if an SDK encounters a failure condition that it cannot recover from. This may be a fatal connection error received from the Ably service (e.g. an attempt to connect with an incorrect API key), or some local terminal error (e.g. the token in use has expired and the SDK does not have any way to renew it).In the failed state, no reconnection attempts are made automatically by an SDK, and clients may not publish messages. A new connection attempt can be triggered by an explicit call to
connect()
on theConnection
object.
There are a number of potential connection state sequences, but some of the more common sequences are covered in this section.
The SDK is initialized and initiates a successful connection:
initialized → connecting → connected
An existing connection is dropped and re-established on the first attempt:
connected → disconnected → connecting → connected
An existing connection is dropped, and re-established after several attempts but within a two minute interval:
connected → disconnected → connecting → disconnected → … → connecting → connected
There is no connection established after initializing the SDK:
initialized → connecting → disconnected → connecting → … → suspended
After a period of being offline a connection is re-established:
suspended → connecting → suspended → … → connecting → connected
The Connection
object is an EventEmitter
and emits an event whose name is the new state whenever there is a connection state change. An event listener function is passed a ConnectionStateChange object as the first argument for state change events.
The Connection
object can also emit an event that is not a state change: an update
event. This happens when there’s a change to connection conditions and there is no applicable status or the state doesn’t change, such as when an SDK remains connected after a reauth.
realtime.connection.on('connected', (stateChange) => {
console.log('Ably is connected');
});
CopyCopied!
Alternatively a listener may be registered so that it receives all state change events:
realtime.connection.on((stateChange) => {
console.log('New connection state is ' + stateChange.current);
});
CopyCopied!
Previously registered listeners can be removed individually or all together:
/* remove a listener registered for a single event */
realtime.connection.off('connected', myListener);
/* remove a listener registered for all events */
realtime.connection.off(myListener);
/* remove all event listeners */
realtime.connection.off();
CopyCopied!
Be aware that when registering listeners for connection state changes certain repeating states may add new listeners each time. For example, registering a listener for on(connected)
adds a new listener each time a client becomes connected, even if this is a reconnected after being offline for a period of time.
Connection state recovery
The Ably system preserves connection state to allow connections to continue transparently across brief disconnections. The connection state that is tracked includes the messages sent to the client on the connection, members present on a channel and the set of channels that the client is attached to.
There are two modes of connection state recovery: resume
and recover
.
Resume
The resume
mode provides transparent recovery of a live client instance across disconnections. Upon disconnection, an SDK will automatically re-attempt connection and, once the connection is re-established, any missed messages will be sent to the client. The developer does not need to do anything to trigger this behavior; all client channel event listeners remain attached and are called when the backlog of messages is received.
There are limitations to resume
recovery. Once a client has been disconnected for more than two minutes, the SDK moves into the suspended state indicating that the connection state is lost. At this point all channels are automatically suspended indicating that channel continuity is not possible. Once the connection is re-established, the SDK will reattach the suspended channels automatically and emit an attached event with the resumed
flag set to false
. This ensures that as a developer, you can listen for attached events and check the resumed flag to see if a channel resumed fully and no messages were lost. If channel continuity is not possible and historical messages are important to you, you would use history to retrieve all older messages, with untilAttach
set to true
.
Recover
The recover
mode addresses the case in which a new SDK instance wishes to connect and recover the state of an earlier connection. This occurs typically in a browser environment when the page has been refreshed and therefore the client instance is disposed of and no client state is retained. In this case any message listeners associated with channels will no longer exist so it is not possible for an SDK simply to send the message backlog on reconnection; instead the client must re-subscribe to each channel it is interested in within 15 seconds, and its message listener(s) will be called with any message backlog for that channel. If it has any members in the presence set, they will need to explicitly re-enter. If the previously attached channels are not re-attached within 15 seconds of a connection being recovered, the client will lose the ability to continue the message stream from before; any subsequent attach()
will result in a fresh attachment, with no backlog sent. A client requests recovery of connection state by including a recovery string in the client options when instantiating the Realtime SDK. See connection state recover options for more information.
In either case, when a connection is resumed or recovered, the message backlog held on the server will be pushed to the client. However, any new messages published will be sent as they become available or messages could be indefinitely deferred on very heavily loaded connections. Therefore the system does not guarantee that messages received after reconnection are delivered in the same order that would have occurred if the connection had not been dropped. In the recover
case, in particular, the order of the message delivery depends on the timing of the reattachment of each channel.
In recover
mode it is necessary to request recovery mode in the client options when instantiating an SDK. Recovery requires that an SDK knows the previous connection’s recoveryKey
value (which includes both the private unique Connection#key
and the last message serial received on that connection). As the recovery key is never shared with any other clients, it allows Ably to safely resend message backlogs to the original client.
Handling connection failures
The client libraries will attempt to automatically recover from non-fatal error conditions. However, it will emit events to say what it’s doing, so you can handle them yourself if you prefer.
Fatal errors
Some classes of errors are fatal. These cause the connection to move to the FAILED
state. An SDK will not attempt any automatic recovery actions. For example, if your token expires and an SDK has no way to get a new token (so no authUrl and authCallback), the connection will enter the FAILED
state
While an SDK will not automatically attempt to reconnect in the FAILED
state, explicit calls to connect()
will make the client try again.
Nonfatal errors
Other classes of error are nonfatal. For example, a client may have network connectivity issues. An SDK will attempt to automatically reconnect and recover from these sort of issues, as detailed in the DISCONNECTED
and SUSPENDED
explanations in the Available connection states section.
If message continuity is lost in the process, e.g. because you have been disconnected from Ably for more than two minutes, the SDK will notify you through the resumed
flag mechanism.