How we scaled the GitHub API with a sharded, replicated rate limiter in Redis

Written by GitHub Engineering / Original link on Apr. 5, 2021

About a year ago, we migrated an old rate limiter in order to serve more traffic and accommodate a more resilient platform architecture. We adopted a replicated Redis backend with client-side sharding. In the end, it worked out great, but we learned some lessons along the way.

The Problem

We had an old rate limiter that was simple enough:

(There might have been more nuance to it, but that’s the main idea.)

However, this limiter had two problems:

The proposed solution

After some discussion, we decided on a new design for the rate limiter:

One option we considered but decided against was using our MySQL-backed KV store (GitHub::KV) for storage. We didn’t want to add traffic to already-busy MySQL primaries: usually, we use replicas for GET requests, but rate limit updates would require write access to a primary. By choosing a different storage backend, we could avoid the additional (and substantial) write traffic to MySQL.

Another advantage to using Redis is that it’s a well-traveled path. We could take inspiration from two excellent existing resources:

The release

To roll out this change, we isolated the current persistence logic into a MemcachedBackend class, and built a new RedisBackend class for the rate limiter. We used a feature flag to gate access to the new backend. This allowed us to gradually increase the percentage of clients using the new backend. We could change the percentage without a deploy, which meant, if something went wrong, we could quickly switch back to the old implementation.

The release went smoothly, and when it was done, we removed the feature flag and the MemcachedBackend class, and integrated RedisBackend directly with the Throttler class that delegated to it.

Then, the bug reports started flowing in….

The bugs

A lot of integrators watch their rate limit usage very closely. We got two really interesting bug reports in the weeks following our release:

  1. Some clients observed that their X-RateLimit-Reset header value “wobbled” – it might show 2020-01-01 10:00:00 for one request, but 2020-01-01 10:00:01 on another request (with one second difference).
  2. Some clients had their requests rejected for being over the limit, but the response headers said X-RateLimit-Remaining: 5000. That didn’t make sense: if they had a full rate limit window ahead of them, why was the request rejected?

How odd!

Fix 1: Manage “reset at” in application code

We were optimistic about using Redis’s built-in time-to-live (TTL) to implement our “reset at” feature. But it turns out, my implementation caused the “wobble” described above.

The Lua script returned the TTL of the client’s rate limit value, and then in Ruby, it was added to to get a timestamp for the X-RateLimit-Reset header. The problem was, time passes between the call to TTL (in Redis) and (in Ruby). Depending exactly how much time, and where it fell on the clock’s second boundary, the resulting timestamp might be different. For example, consider the following calls:

Redis call begins latency TTL (Redis) latency returns sum of TTL and
10:00:04.2 0.1 5 0.1 10:00:05.4 10:00:10.1
(then, a half-second later)
10:00:05.9 0.05 5 0.1 10:00:06.05 10:00:11.05

In that case, since the second boundary happened between the call to TTL and, the resulting timestamp was one second bigger than the previous ones.

We could have tried increasing the precision of this operation (eg, Redis PTTL), but there would still have been some wobble, even if it was greatly reduced.

Another possibility was to calculate the time using only Redis, instead of mixing Ruby and Redis calls to create it. Redis’s TIME command could have been used as the source of truth. (Old Redis versions didn’t allow TIME in Lua scripts, but Redis 5+ does.) We avoided this design because it would have been harder to test: by using Ruby’s time as the source of truth, I could time-travel in my tests with Timecop, asserting that expired keys were handled correctly without actually waiting for Redis’s calls to the system clock to return true, future times. (I still had to wait on Redis to test the EXPIRE-based database cleanup, but since expires_at came from Ruby-land, I could inject very short expiration windows to simplify testing.)

Instead, we decided to persist the “reset at” time from Ruby in the database. That way, we could be sure it wouldn’t wobble. (Wobbling was an effect of the calculation – but reading from the database would guarantee a stable value.) Instead of reading TTL from Redis, we stored another value in the database (effectively doubling our storage footprint, but OK).

We still applied a TTL to rate limit keys, but they were set for one second after the “reset at” time. That way, we could use Redis’s own semantics to clean up “dead” rate limit windows.

Fix 2: Account for expiration in replicas

Weirdly, many clients reported rejections that included X-RateLimit-Remaining: 5000 headers. What’s going on!?

We found another problem that worked like this:

  1. At the beginning of the request, check the client’s current rate limit value. If it’s over the maximum allowed limit, prepare a rejection response.
  2. Before delivering the response, increment the current rate limit value, and use the response to populate the X-RateLimit-... headers.

Well, it turned out that Step 1 above hit a Redis replica, since it was a read operation. The read operation returned information about the client’s previous window, and the application prepared a rejection response.

Then, Step 2 would hit a Redis primary. During that database call, Redis would expire the previous window data and return data for a fresh rate limit. This is a known limitation of Redis: replicas don’t expire data until they receive instructions to do so from their primaries, and primaries don’t expire keys until they’re accessed (GitHub issue). (In fact, primaries do randomly sample keys from time to time, expiring them as appropriate, see “How Redis Expires Keys”.)

Addressing this issue required two things:

The final scripts

Here are the Lua scripts we ended up with for implementing this pattern:

--   count a request for a client
--   and return the current state for the client
-- rename the inputs for clarity below
local rate_limit_key = KEYS[1]
local increment_amount = tonumber(ARGV[1])
local next_expires_at = tonumber(ARGV[2])
local current_time = tonumber(ARGV[3])
local expires_at_key = rate_limit_key .. ":exp"
local expires_at = tonumber("get", expires_at_key))
if not expires_at or expires_at 


We’ve learned a lot from this new approach, but there’s still one shortcoming we’re considering: the current implementation doesn’t increment the “current” rate limit value until after the request is finished. We do this because we don’t charge clients for 304 Not Modified responses (this can happen when the client provides an E-Tag). A better implementation might increment the value when the request starts, then refunds the client if the response is 304. That would prevent some edge cases where a client can exceed its limit when the final allowed request is still being processed.

After working out the issues described in this post, the new rate limiter has worked great. It has improved reliability, fixed issues for clients, and reduced our support load (eventually 1f609.png) and the architecture is ready for our next wave of platform improvements.

githubengineering githubengineering

« Behind GitHub’s new authentication token formats - Easter bank holiday »