4 min readUpdated Jul 19, 2022

Redis scripts do not expire keys atomically

Redis scripts do not expire keys atomically
Andrew DunstallAndrew Dunstall

This short post by a member of Ably's engineering team describes how we resolved a problem that is typical of the challenges we face each week. We thrive on solving hard distributed system problems that are mostly platform agnostic and theoretical in nature, and this is the first post in a long-term series of articles about things we've learned recently.

How we use Redis at Ably

Ably is a platform for pub/sub messaging. Publishes are made on named channels, and clients subscribed to a given channel have all messages on that channel delivered to them. We use Redis, a distributed in-memory database for key-based storage, to store various entities such as authentication tokens and ephemeral channel state. It’s a good fit for temporary storage of messages while we process them.

We have billions of active Redis keys at any given time, which are sharded across numerous Redis instances. The sharding strategy places related keys in the same shard so that we can perform operations that update related keys atomically. We use Lua Redis scripts extensively to query and update keys and rely on the atomicity of script execution to preserve the integrity of values of related keys. That is, either all commands in the script run, or none at all run, and no other commands execute at the same time.

We also use expiring keys extensively; the nature of the Ably service is that much of the state of a channel is ephemeral and only retained for a limited period of time (typically 2 minutes). We set keys to have a TTL so they auto-expire.

The issue

The integrity of a set of related keys requires that either all keys exist, or none exist. We had assumed that the atomic nature of script execution would also apply to expire operations invoked by a script, but it isn't in fact true that naively expiring multiple keys in the same script will preserve that integrity.

While expire operations execute atomically within the same script (with no opportunity for intervening operations to occur), nonetheless the timestamps associated with each expire operation are not necessarily the same.

Running TIME shows two different values:

-- time.lua       

local a = redis.call('time')       
local b = redis.call('time')       
return {a, b}       
$ ./redis-cli --eval /app/time.lua      

1) 1) "1638280442"     
   2) "996960"     
2) 1) "1638280442"     
   2) "996966"      

Checking the actual expiry time:

-- expire_check.lua     

redis.call('set', 'foo', '1')     
redis.call('expire', 'foo', 1)     

-- slow calls...

redis.call('set', 'bar', '2')     
redis.call('expire', 'bar', 1)     

local fooExpiry = redis.call('PEXPIRETIME', 'foo')     
local barExpiry = redis.call('PEXPIRETIME', 'bar')     
return {fooExpiry, barExpiry}     
$ ./redis-cli --eval /app/expire_check.lua     

1) (integer) 1638280843717     
2) (integer) 1638280843730     

The expire might not be pin-point accurate, and it could be between zero to 1 milliseconds out.

The implication is that there could be times at which some keys have expired, but other related keys have not and this could lead to an inconsistent state.

New call-to-action

Our solution

The solution is to use EXPIREAT to set an absolute expiry time for all related keys, rather than rely on a relative expiry time through the TTL.

The Redis documentation is not clear if multiple key expiry is guaranteed to occur at the same time if keys have the same EXPIREAT setting. To be cautious, we reordered key expiry to ensure that, regardless, we avoid inconsistency.

-- expire_new.lua     

-- Unix time     

local now = redis.call('time')[1]     
local expiry = now + 1     
redis.call('set', 'foo', '1')     
redis.call('expireat', 'foo', expiry)     

-- slow calls...     

redis.call('set', 'bar', '2')     
redis.call('expireat', 'bar', expiry)     
local fooExpiry = redis.call('PEXPIRETIME', 'foo')     
local barExpiry = redis.call('PEXPIRETIME', 'bar')     
return {now, fooExpiry, barExpiry}     
$ ./redis-cli --eval /app/expire_new.lua

2) (integer) 1638281266000     
3) (integer) 1638281266000     

This is typical of one of the many engineering problems we troubleshoot and solve each week here at Ably.

Fancy working with us in the realtime sphere? Our engineers have a range of broad technology skills across infrastructure, security, distributed systems, and beyond.

You can find us on Twitter or LinkedIn, and apply to join us in one of our open roles.

| Discuss this post on Hacker News |

Latest from Ably Engineering

Join the Ably newsletter today

1000s of industry pioneers trust Ably for monthly insights on the realtime data economy.
Enter your email