How we ship code faster and safer with feature flags

Written by GitHub Engineering / Original link on Apr. 27, 2021

At GitHub, we’re continually working to improve existing features and shipping new ones all the time. From our launch of GitHub Discussions to the release of manual approvals for GitHub Actions—in order to ship new features and improvements faster while lowering the risk in our deployments, we have a simple but powerful tool: feature flags.

Reducing deployment risk

We deploy changes around the clock, and we need to keep the service running with no disruptions. For these reasons, deploying needs to be an almost risk-free process. Any problem that comes up during a deployment requires a rollback of the code. During this rollback period, users could be impacted and other deployments delayed; the more time that elapses the bigger the impact.

We use feature flags to reduce the risk of deploying to production. Any potentially risky change is put behind a feature flag in the code and then, when the deployment is done, we enable the feature flag to everyone or to a percentage of actors. This way we minimize the impact of the new changes, and if something goes wrong we can disable the feature flag completely in a matter of seconds without interrupting other deployments. This is fundamental to us for the following reasons:

What kind of things can go wrong during a deployment, where a feature flag can help? These are some examples:

Working on new features

Feature flags are not only useful to reduce deployment risk when fixing or improving existing features, we also use them when developing new features!

Using feature flags allows us to work on features incrementally. Only staff members working on the project have the corresponding feature flag enabled, so they can see the new feature that is under development while other users cannot. We don’t use long-lived feature branches. Instead, feature flags allow us to work on small batches, which brings us many benefits:

In order to adopt this way of building new features, you need to do a little bit of additional planning upfront to divide the work. For example, a new feature typically needs new data models or changes in existing ones. If you create a pull request only for the data model changes, then you may get blocked until those changes are merged. There are a couple of things we usually do to prevent this:

This way of working could introduce some friction due to having to ask for reviews more frequently. Sometimes it can take days until a team reviews your pull request. To prevent this from being a blocker, we follow one or more of these strategies:

Keep working even if the changes are not merged into the main branch. Either on that “spike” pull request or branching off the branch that is waiting for reviews.

Have a first responder person in each team that is focused on reviewing pull requests quickly to unblock the progress of their own team or other teams.
Sometimes if a critical pull request is ready to be deployed sooner rather than later, but has outstanding non-critical feedback waiting to be addressed, we do that in follow up pull requests.

Testing features

We have to make sure that the new functionality works as expected while also maintaining the existing functionality. In order to do that, we have many mechanisms to enable or disable feature flags at all levels as follows:

Different shipping strategies

We already mentioned that we can flag particular actors, or we can enable a feature flag for a percentage of actors. Those are the most common options, but others include:

All these strategies and the full administration of feature flags is done through a web UI. Not all staff members have access to this UI, but most engineers do. The interface allows us to manage the shipping status of a feature flag, creation of new ones, and deletion. Additionally, we can see a history of changes made to the feature flag: who made the change, when and what changes were made, and other metadata, such as which team owns it.

What to flag

When we say that we can flag an actor, we mean that we can flag not only users, but other entities. In particular, we can flag users, organizations, teams, enterprises, repositories, or GitHub Apps. When writing the code that checks if the flag is enabled or not, we need to specify which actor has the flag enabled or not. These are some examples:

Sometimes we do more sophisticated checks, such as checking multiple actors or checking if a repository is public or not. For example, when we staff ship a feature it may make a new feature available for all repositories of an employee. If the feature must not be visible to other users, we want to enable the feature only for private repositories to prevent public repositories of an employee leaking the new feature.

The cost of a feature flag

As we have seen, using feature flags extensively changes the way we work, and requires a bit of additional planning and coordination. But there’s also additional costs associated with feature flags that we need to keep in mind. First, we have a runtime cost. When we check if a feature flag is enabled to an actor, we need to first load the metadata information of the flag, which is stored in MySQL but cached in memcached. We are now considering having the feature metadata and the list of actors always in memory to reduce this overhead. There’s an exception for this: we consider some feature flags “large.” An example of this was GitHub Actions, a feature that we rolled out to tens of thousands of users every week. For these large feature flags, when they are still not fully enabled, we need to do a per-actor query to MySQL to check if the actor is in the list of enabled actors.

For runtime, there’s also a cost in the form of technical debt. Once a feature flag is fully enabled, it leaves a lot of dead code and old tests in the repository that need to be deleted. Usually we do this manually after a feature has been rolled out for a few days or weeks depending on the case. But we are now automating the process: we have created a script that can be either invoked locally or by triggering a workflow manually.


The script uses regular expressions to find usages of the feature flag using git grep. Then for the matched Ruby files, it modifies the code using rubocop-ast. It is capable of deleting code blocks, such as if/else statements, and modifying boolean expressions, in addition to reindenting the code afterwards. When the file is a test, it deletes the code that enables the feature flag. For tests that check the behavior of the application when the feature flag is disabled, it just leaves a comment and makes the test fail, since in many cases we don’t want to just delete the test, but simply update it. As a result, even when the script requires some intervention from an engineer, it drastically reduces the manual work required in the process.

When the script is run using a workflow dispatch, it also creates a new branch and a pull request. Since we do extensive use of CODEOWNERS, the pull request will have the proper teams and people as reviewers. In the near future, we want to automate this process even more by running the script automatically when we can consider a feature flag stale.


Feature flags allow us to ship code faster with higher confidence. Using them we reduce deployment risk and make code reviews and development on new features easier, because we can work on small batches. There are also associated costs, but the benefits highly exceed them.

githubengineering githubengineering githubengineering

« Tinkerwell now with Apple Silicon support - Deepfake satellite imagery poses a not-so-distant threat, warn... »