How we scaled the GitHub API with a sharded, replicated rate limiter in Redis
About a year ago, we migrated an old rate limiter in order to serve more traffic and accommodate a more resilient platform architecture. We adopted a replicated Redis backend with client-side sharding. In the end, it worked out great, but we learned some lessons along the way.
We had an old rate limiter that was simple enough:
- For every request, determine a “key” for the current rate limit
- In Memcached, increment the value of that key, setting it to 1 if there wasn’t any current value
- Also, if there wasn’t already one, set a “reset at” value in Memcached, using a related key (eg, “
- When incrementing, if the “reset at” value is in the past, ignore the existing value and set a new “reset at”
- At the beginning of each request, if the value for the key is above the limit, and “reset at” is in the future, then reject the request
(There might have been more nuance to it, but that’s the main idea.)
However, this limiter had two problems:
- Our Memcached architecture was due to change. Since it was mostly used as a caching layer, we were going to switch from a single, shared Memcached to one Memcached per datacenter. Although that’d work fine for application caching, it would cause our rate limiter to behave very strangely if client requests were routed to different data centers.
- Memcached “persistence” wasn’t working for us. The Memcached backend was shared by the rate limiter and other application caches which meant that, when it filled up, it would sometimes evict rate limiter data, even when it was still active. (As a result, clients would get “fresh” rate limit windows when they shouldn’t. Sometimes, only one key would be evicted – they’d keep the same “used” value, but get new, future, “reset at” values!)
After some discussion, we decided on a new design for the rate limiter:
- Use Redis, since it has a more appropriate persistence system and simple sharding and replication setups
- Shard inside the application: the app would pick, for each key, which Redis cluster to read and write from
- To mitigate the CPU-bound nature of Redis, put a single primary (for writes) and several replicas (for reads) in each cluster
- Instead of writing “reset at” in the database, use Redis expiration to make values disappear when they no longer apply
- Implement the storage logic in Lua, to guarantee atomicity of operations (this was an improvement over the previous design)
One option we considered but decided against was using our MySQL-backed KV store (GitHub::KV) for storage. We didn’t want to add traffic to already-busy MySQL primaries: usually, we use replicas for GET requests, but rate limit updates would require write access to a primary. By choosing a different storage backend, we could avoid the additional (and substantial) write traffic to MySQL.
Another advantage to using Redis is that it’s a well-traveled path. We could take inspiration from two excellent existing resources:
- Redis’s own documentation, which includes some rate limiter patterns
- Stripe’s technical blog post, “Scaling your API with Rate Limiters”, which includes a Ruby and Redis example implementation
To roll out this change, we isolated the current persistence logic into a
MemcachedBackend class, and built a new
RedisBackend class for the rate limiter. We used a feature flag to gate access to the new backend. This allowed us to gradually increase the percentage of clients using the new backend. We could change the percentage without a deploy, which meant, if something went wrong, we could quickly switch back to the old implementation.
The release went smoothly, and when it was done, we removed the feature flag and the
MemcachedBackend class, and integrated
RedisBackend directly with the
Throttler class that delegated to it.
Then, the bug reports started flowing in….
A lot of integrators watch their rate limit usage very closely. We got two really interesting bug reports in the weeks following our release:
- Some clients observed that their
X-RateLimit-Resetheader value “wobbled” – it might show
2020-01-01 10:00:00for one request, but
2020-01-01 10:00:01on another request (with one second difference).
- Some clients had their requests rejected for being over the limit, but the response headers said
X-RateLimit-Remaining: 5000. That didn’t make sense: if they had a full rate limit window ahead of them, why was the request rejected?
We were optimistic about using Redis’s built-in time-to-live (TTL) to implement our “reset at” feature. But it turns out, my implementation caused the “wobble” described above.
The Lua script returned the TTL of the client’s rate limit value, and then in Ruby, it was added to
Time.now.to_i to get a timestamp for the
X-RateLimit-Reset header. The problem was, time passes between the call to TTL (in Redis) and
Time.now.to_i (in Ruby). Depending exactly how much time, and where it fell on the clock’s second boundary, the resulting timestamp might be different. For example, consider the following calls:
|Redis call begins||latency||TTL (Redis)||latency||Time.now returns||sum of TTL and Time.now|
|(then, a half-second later)|
In that case, since the second boundary happened between the call to TTL and
Time.now, the resulting timestamp was one second bigger than the previous ones.
We could have tried increasing the precision of this operation (eg, Redis PTTL), but there would still have been some wobble, even if it was greatly reduced.
Another possibility was to calculate the time using only Redis, instead of mixing Ruby and Redis calls to create it. Redis’s
TIME command could have been used as the source of truth. (Old Redis versions didn’t allow
TIME in Lua scripts, but Redis 5+ does.) We avoided this design because it would have been harder to test: by using Ruby’s time as the source of truth, I could time-travel in my tests with Timecop, asserting that expired keys were handled correctly without actually waiting for Redis’s calls to the system clock to return true, future times. (I still had to wait on Redis to test the
EXPIRE-based database cleanup, but since
expires_at came from Ruby-land, I could inject very short expiration windows to simplify testing.)
Instead, we decided to persist the “reset at” time from Ruby in the database. That way, we could be sure it wouldn’t wobble. (Wobbling was an effect of the calculation – but reading from the database would guarantee a stable value.) Instead of reading TTL from Redis, we stored another value in the database (effectively doubling our storage footprint, but OK).
We still applied a TTL to rate limit keys, but they were set for one second after the “reset at” time. That way, we could use Redis’s own semantics to clean up “dead” rate limit windows.
Weirdly, many clients reported rejections that included
X-RateLimit-Remaining: 5000 headers. What’s going on!?
We found another problem that worked like this:
- At the beginning of the request, check the client’s current rate limit value. If it’s over the maximum allowed limit, prepare a rejection response.
- Before delivering the response, increment the current rate limit value, and use the response to populate the
Well, it turned out that Step 1 above hit a Redis replica, since it was a read operation. The read operation returned information about the client’s previous window, and the application prepared a rejection response.
Then, Step 2 would hit a Redis primary. During that database call, Redis would expire the previous window data and return data for a fresh rate limit. This is a known limitation of Redis: replicas don’t expire data until they receive instructions to do so from their primaries, and primaries don’t expire keys until they’re accessed (GitHub issue). (In fact, primaries do randomly sample keys from time to time, expiring them as appropriate, see “How Redis Expires Keys”.)
Addressing this issue required two things:
- Basically, the same fix as above: instead of relying on Redis’s TTL to expire old rate limit windows, we needed to manage that feature in the application. (The application should be prepared to read stale data from replicas, then ignore it.)
- Even after fixing that, a better design was required: in the case of rate-limited requests, we should avoid a second call to the database. The client’s window might expire between the two calls, resulting in the kind of inconsistent response described above. This fix required improving the Ruby code that prepared responses so that the response from Step 1 above was used to populate
Here are the Lua scripts we ended up with for implementing this pattern:
-- RATE_SCRIPT: -- count a request for a client -- and return the current state for the client -- rename the inputs for clarity below local rate_limit_key = KEYS local increment_amount = tonumber(ARGV) local next_expires_at = tonumber(ARGV) local current_time = tonumber(ARGV) local expires_at_key = rate_limit_key .. ":exp" local expires_at = tonumber(redis.call("get", expires_at_key)) if not expires_at or expires_at
We’ve learned a lot from this new approach, but there’s still one shortcoming we’re considering: the current implementation doesn’t increment the “current” rate limit value until after the request is finished. We do this because we don’t charge clients for
304 Not Modified responses (this can happen when the client provides an E-Tag). A better implementation might increment the value when the request starts, then refunds the client if the response is
304. That would prevent some edge cases where a client can exceed its limit when the final allowed request is still being processed.
After working out the issues described in this post, the new rate limiter has worked great. It has improved reliability, fixed issues for clients, and reduced our support load (eventually ) and the architecture is ready for our next wave of platform improvements.