PHPnews.io

Write Better Commits, Build Better Projects

Written by GitHub Engineering / Original link on Jun. 30, 2022

How often have you found yourself thinking:

These questions all reflect the limitations of collaboratively-developed source code as a communication medium. While there are ways to mitigate these issues (code comments, style guides, documentation requirements), we still inevitably find ourselves spending hours on just trying to understand code. Luckily, the tools needed to solve these problems have been here all along!

Commits in Git repositories are more than just save points or logs of incremental progress in a larger project. In the words of GitHub’s “Git Guides“:

[Commits] are snapshots of your entire repository at specific times…based around logical units of change. Over time, commits should tell a story of the history of your repository and how it came to be the way that it currently is.


Commits are a firsthand historical record of exactly how and why each line of code came to be. They even come with human-readable messages! As a result, a repository’s commit history is the best tool a developer can use to explain and understand code.

It’s been my experience that commits are most effective when they’re tweaked and polished to deliberately convey a message to their audiences: reviewers, other contributors, and even your future self. This post will:

  1. Introduce some guidelines for organizing and revising commits.
  2. Outline pragmatic approaches for applying those guidelines.
  3. Describe some of the practical applications of a well-crafted commit history.

Writing better commits

Software development involves a lot of creativity, so your commits will reflect the context of your changes, your goals, and your personal style. The guidelines below are presented to help you utilize that creative voice to make your commits effective tools of communication.

As you read these guidelines, don’t worry about how you’ll be able to utilize all of this advice in the midst of writing code. Although you may naturally incorporate them into your development process with practice, each can be applied iteratively after you’ve written all of your code.

1f4da.png Structure the narrative

Like your favorite novel, a series of commits has a narrative structure that contextualizes the “plot” of your change with the code. Before any polishing, the narrative of a branch typically reflects an improvised stream of consciousness. It might contain:

Although an accurate retelling of your journey, a branch like this tells a “story” that is neither coherent nor memorable.

The problem

Disorganized commits that eschew a clear narrative will affect two people: the reviewer, and the developer themself.

Reviewing commit-by-commit is the easiest way to avoid being overwhelmed by the changes in a sufficiently large pull request. If those commits do not tell a singular, easy-to-follow story, the reviewer will need to context-switch as the author’s commits jump from topic to topic. To ensure earlier commits properly set up later ones (for example, verifying a newly-created function is used properly), the reviewer ultimately needs to piece together the narrative on their own; for each commit, figure out which earlier changes establish the relevant background context and tediously click back and forth between them. Alternatively, they’ll remember some vague details and simply assume earlier commits properly set up later ones, failing to identify potential issues.

But how does a scatterbrained narrative hurt the developer? A developer’s first instinct when working on a new project is often to hack on it until they get something functional. Fluctuating between “fun” and “frustrating,” this approach eventually yields good results, but it’s far from efficient. Jumping in without a plan – the mindset of following a narrative – makes that process slower than it needs to be.

The solution

Outline your narrative, and reorganize your commits to match it.

The narrative told by your commits is the vehicle by which you convey the meaning of your changes. Also, like a story, it can take on many structures or forms.

Narrative_Collage.png?resize=1024%2C448

Your branch is your story to tell. While the narrative is up to you, here are some editorial tips on how to keep it organized:

DO DON’T
Write an outline and include it in the pull request description. Wait until the end to form the outline – try using it to guide your work!
Stick to one high-level concept per branch. Go down a tangentially-related “rabbit hole”.
Add your “implement feature” commit immediately after the refactoring that sets it up. Jump back and forth between topics throughout your branch.
Treat commits as “building blocks” of different types: bugfix, refactor, stylistic change, feature, etc. Mix multiple building block types in a single commit.

How do I do it?

2139.png _The branch used in this example can be found here – clone it and follow along for some hands-on practice!_

Suppose I’ve been working on a script that lets me load an image, make some change to it (for example, invert the colors), then either display it or save the modified image to a new location. My branch – which I’ve named feature/image-modifier – currently looks like this (displayed with newest commits at the bottom):

narrative-example-01.png?w=184&resize=184%2C300
Before modifying the branch, I want to outline the narrative. In this case, my “story” is:

  1. Create the basic script (no options, just read & display the image).
  2. Add --output option.
  3. Add image manipulation options (--invert and --grey/--gray).
  4. Add GitHub Actions CI for basic linting.

To reorder the commits to match this outline, I’ll use an interactive rebase, invoked with git rebase -i.

git rebase -i main

After initiating the rebase, my default editor opens with the git-rebase-todo file listing commits ordered from oldest (top) to newest (bottom). Now, I simply cut and paste the lines into the new order I want for the branch:

Before

pick 6a885eb WIP
pick 692f477 Finish script
pick b3348a0 Add --invert and --grey
pick 9512893 Add --output option
pick 1689371 Add GitHub Actions CI .yml
pick 6af4476 Add requirements.txt + other build fixes
pick 9cd6412 Let users use --gray option spelling
# Rebase 00a3ff6..9cd6412 onto 00a3ff6 (7 commands)
# ...

After

pick 6a885eb WIP
pick 692f477 Finish script
pick 9512893 Add --output option                        # moved
pick b3348a0 Add --invert and --grey
pick 9cd6412 Let users use --gray option spelling       # moved
pick 1689371 Add GitHub Actions CI .yml
pick 6af4476 Add requirements.txt + other build fixes
# Rebase 00a3ff6..9cd6412 onto 00a3ff6 (7 commands)
# ...

Once I save and close my editor, the rebase begins. I encounter some minor rebase conflicts, but I’m able to resolve them and run git rebase --continue until all remaining commits are applied to the branch in their new order:

narrative-example-02.png?w=1024&resize=1024%2C640

269b.png Resize and stabilize the commits

Although the structure of a commit series can tell the high-level story of an author’s feature, it’s the code within each commit that creates software. Code itself can be complex, dense, and cryptic but in order to collaborate, others need to understand it.

The problem

The cognitive burden of parsing code is exacerbated by having either too much or not enough information presented at once. Too much, and your reader will need to read and understand multiple conceptually-different topics that could get jumbled, misinterpreted, or simply missed; too little, and your reader will develop an incomplete mental model of a change.

For a reviewer, one of the big benefits of a commit-by-commit review is – like individual lectures in a semester-long course – pacing the development of their mental model with small, easy-to-digest changes. When a large commit doesn’t provide that sustainable learning pace, the reviewer may fail fail to identify questionable architectural decisions because they conflate unrelated topics, or even miss a bug because it’s in a section seemingly irrelevant to the impacted feature.

You might think reviewers’ problems would be solved with commits as small as possible, but an incomplete change leaves them unable to evaluate it fully as they read it. When a later commit “completes” the change, a reviewer may not easily draw connections to the earlier context. This is made worse when a later commit undoes something from the earlier, partial commit. The “churn” in these situations leads to the same weakened mental model – and same consequences – as when dealing with too-large commits.

Poorly-sized commits present more tangible issues as well. Most apparent is the inability to roll back your repository to a commit (for example, when debugging a strange feature interaction). Incomplete changes often fail to build, so a developer will be stuck searching nearby commits for a fix. Similarly, a bug narrowed down to a massive commit requires teasing apart its intermixed changes, a potentially more difficult task than it was during the initial review due to loss of institutional project knowledge over time.

The solution

Make each commit both “small” and “atomic.”

To best convey your story, commits should minimize the effort needed to build a mental model of the changes they introduce. With effort tied to having a “just right” amount of information, the key to a good commit is fitting into quantified upper and lower bounds on that information.

A small commit is one with minimal scope; it does one “thing.” This often correlates to minimizing the modified lines of cone, but that isn’t a firm requirement. For example, changing the name of a commonly-used function may modify hundreds of lines of code, but its constrained scope makes it simple to both explain and review.

A commit is atomic when it is a stable, independent unit of change. In concrete terms, a repository should still build, pass tests, and generally function if rolled back to that exact commit without needing other changes. In an atomic commit, your reader will have everything they need to evaluate the change in the commit itself.

How do I do it?

Let’s return to the “image modifier” script from earlier. Where we last left it, the feature/image-modifier branch looked like this:

small-atomic-example-01.png?w=262&resize=262%2C300
Commits can be split and combined to adjust their size and scope. I’ll start by identifying the commits I want to split:
3f1e287 (Add --invert and --grey) – while this commit is small, it contains two distinct “things”: the --invert option, and the --grey option. In keeping with the principle of smallness in scope, then, I want to create one commit for each option.
60f352d (Add requirements.txt + other build fixes) – the addition of requirements.txt is necessary for the script to run successfully in any context, whereas the “other build fixes” all pertain specifically to the GitHub Action CI definition. Because the two deal with ultimately different issues, relevant to different aspects of the narrative, I will split them.

To split them, I’ll follow the method described in the Git docs and rebase with the commits I want to split marked with edit, or e:
Before

pick 6a885eb WIP
pick 692f477 Finish script
pick d897cc6 Add --output option
pick 3f1e287 Add --invert and --grey
pick 1c26b4c Let users use --gray option spelling
pick d113307 Add GitHub Actions CI .yml
pick 60f352d Add requirements.txt + other build fixes

After

pick 6a885eb WIP
pick 692f477 Finish script
pick d897cc6 Add --output option
edit 3f1e287 Add --invert and --grey # pick -> edit
pick 1c26b4c Let users use --gray option spelling
pick d113307 Add GitHub Actions CI .yml
edit 60f352d Add requirements.txt + other build fixes # pick -> edit

Upon saving and closing my editor, the rebase applies commits until it reaches the first split candidate. For each split, I undo the original commit with git reset, then divide its contents into two appropriately-sized new ones (using git add -p and git commit), then finally continue the rebase with git rebase --continue. When the rebase completes, the commits are split as indicated earlier:
small-atomic-example-02.png?w=1024&resize=1024%2C832
The next step is to combine the commits that are incomplete or otherwise too small:
6a885eb (WIP) and 692f477 (Finish script) – the former is an incomplete commit, and the latter completes it to form the minimal stable version of the script.
692f477 (Finish script) and 54b3e6d (Create requirements.txt)requirements.txt is needed to install dependencies in the initial commit’s “minimal” script.
9c049cb (Add GitHub Actions CI .yml) and (73e6da5) Fix CI build – the CI fixes are all corrections to the initial GitHub Actions script needed for build to succeed.

I do this with another interactive rebase, this time using fixup (or f) or squash (or s) depending on whether I’m silently correcting unintentional issues in a commit or combining commits with an entirely new message, respectively:

Before

pick 6a885eb WIP
pick 692f477 Finish script
pick d897cc6 Add --output option
pick 18dc6af Add --invert option
pick babc825 Add --grey option
pick 851f2a0 Let users use --gray option spelling
pick 9c049cb Add GitHub Actions CI .yml
pick 54b3e6d Create requirements.txt
pick 73e6da5 Fix CI build

After

pick 6a885eb WIP
squash 692f477 Finish script # pick -> squash
fixup 54b3e6d Create requirements.txt # moved, pick -> fixup
pick d897cc6 Add --output option
pick 18dc6af Add --invert option
pick babc825 Add --grey option
pick 851f2a0 Let users use --gray option spelling
pick 9c049cb Add GitHub Actions CI .yml
fixup 73e6da5 Fix CI build # pick -> fixup

When I close the editor, I’m stopped to create a new commit message for the combined “WIP” and “Finish script” commits due to the use of squash:

# This is a combination of 2 commits.
# This is the 1st commit message:

WIP

# This is the commit message #2:

Finish script

Now reads image from input

# Please enter the commit message for your changes. Lines starting
...

I’ll change the commit message to “Create initial image modifier script,” save my editor and continue. The remaining fixups are applied without any conflicts, so the rebase completes with commits combined as specified:
small-atomic-example-03.png?resize=1024%2C832

2753.png Explain the context

Commits are more than just the code they contain. Despite there being no shortage of jokes about them, commit messages are an extremely valuable – but often overlooked – component of a commit. Most importantly, they’re an opportunity to speak directly to your audience and explain a change in your own terms.

The problem

Even with a clear narrative and appropriately-sized commits, a niche change can still leave readers confused. This is especially true in large or open-source projects, where a reviewer or other future reader (even yourself!) likely won’t be clued into the implementation details or nuances of the code you’ve changed.

Code is rarely as self-evident as the author may believe, and even simple changes can be prone to misinterpretation. For example, what may appear to be a bug may instead be a feature implemented to solve an unrelated problem. Without understanding the intent of the original change, a developer may inadvertently modify an expected user-facing behavior. Conversely, something that appears intentional may have been a bug in the first place. A misinterpretation could cause a developer to enshrine a small mistake as a “feature” that hurts user experience for years.

Even in a best-case scenario, poorly explained changes will slow down reviewers and contributors as they attempt to interpret code, unnecessarily wasting everyone’s time and energy.

The solution

Describe what you’re doing and why you’re doing it in the commit message.

Because you’re writing for an audience, the content of a commit message should clearly communicate what readers need to understand. As the developer, you should already know the background and implementation well enough to explain them. Rather than write excessively long (and prone to obsoletion) code comments or put everything into a monolithic pull request description, you can use commit messages to provide piecewise clarification to each change.

“What” and “why” break down further into high- and low-level details, all of which can be framed as four questions to answer in each commit message:

What you’re doing Why you’re doing it
High-level (strategic) Intent (what does this accomplish?) Context (why does the code do what it does now?)
Low-level (tactical) Implementation (what did you do to accomplish your goal?) Justification (why is this change being made?)

How do I do it?

Looking back once again at the feature/image-modifier branch:

message-example-01.png?w=293&resize=293%2C300Before submitting for review, each of these commits should be rewritten to incorporate the intent, implementation, context, and justification of their respective change. For the sake of this example, I’ll reword only the “Let users use –gray option spelling” commit.

This commit isn’t the most recent one on the branch (in which case I could use git commit --amend to modify it), so I again need to use an interactive rebase to make changes to it. This time, the only thing I need to do is change the commit from pick to reword (or r):
Before

pick 096ee13 Create initial image modifier script
pick 381d3af Add --output option
pick 3e5e5f6 Add --invert option
pick 2d164e2 Add --grey option
pick bfa3f6f Let users use --gray option spelling
pick 3bf4ec4 Add GitHub Actions CI .yml

After

pick 096ee13 Create initial image modifier script
pick 381d3af Add --output option
pick 3e5e5f6 Add --invert option
pick 2d164e2 Add --grey option
reword bfa3f6f Let users use --gray option spelling # pick -> reword
pick 3bf4ec4 Add GitHub Actions CI .yml

After closing the git-rebase-todo, a new editor instance opens with the original commit message:

Let users use --gray option spelling
# Please enter the commit message for your changes. Lines starting
...

This message – like many one-line commit messages – only contains a vague description of the intent and is completely missing an explanation of how the change was implemented or why this change is needed.

When revising my commits, I like to order my “what” and “why” sections as follows:

  1. Intent (as the title)
  2. Context
  3. Justification
  4. Implementation

Applying that outline to the commit I’m editing here, I write approximately one (short) sentence for each item:

Add '--gray' option alias for '--grey'
If a user provides the option '--gray' (rather than the valid option
'--grey') to the image modifier script, they receive the error:
image-modifier.py: error: unrecognized arguments: --gray
Because both "grey" and "gray" are common spellings of the same word, allow
both to indicate the convert-to-greyscale operation by adding "--gray" to
the 'argparse' argument definition.

The above message covers all of the context a reader might need, but is a bit of an over-explanation for such a simple change. While it is essential that the message includes some form of intent/implementation/context/justification, it’s perfectly acceptable to tailor the level of detail to the complexity of the change. For example, a more concise – but still fully-explained – commit message might instead look like:

Add '--gray' option alias for '--grey'
Include '--gray' as an alternative name for '--grey' in the 'argparse'
definition so that users can specify either common spelling for the option.

Building better projects

Using the guidelines established above, you can mitigate the challenges of common software development tasks including code review, finding bugs, and root cause analysis.

Code review

Reviewing even the largest pull requests can be a manageable, straightforward process if you are able to evaluate changes on a commit-by-commit basis. Each of the guidelines detailed earlier focuses on making the commits readable; to extract information from commits, you can use the guidelines as a template.

  1. Determine the narrative by reading the pull request description and list of commits. If the commits seem to jump between topics or address, leave a comment asking for clarification or changes.
  2. Lightly scan the message and contents of each commit, starting from the beginning of the branch. Verify smallness and atomicity by checking that the commit does one thing and that doesn’t include any incomplete implementations. Recommend splitting or combining commits that are incorrectly scoped.
  3. Thoroughly read each commit. Ensure the commit message sufficiently explains the code by first checking that implementation matches the intent, then that the code matches the stated implementation. Use the context and justification to guide your understanding of the code. If any of the requisite information is missing, ask for clarification from the author.
  4. Finally, with a complete mental model of the commit’s changes and the overarching narrative, confirm the code is efficient and bug-free.

Finding bugs with git bisect

If you’ve ever found yourself with a broken deployment and no idea when the breakage was introduced, git bisect is the tool for you. Specifically, git bisect is a tool built into Git that, when given a known-good commit (for example, your last stable deployment) and a known-bad commit (the broken one), will perform a binary search of the commits in the middle to find which one introduced the error.

Bisect_Collage.png?w=991&resize=991%2C1024
As useful as git bisect is, it absolutely requires that each commit it traverses is both atomic and small. If not atomic, you will be unable to test for repository stability at each commit; if not small, the source commit of your bug may be so large that you end up inefficiently reading code line-by-line to find the bug anyway.

Root cause analysis

Suppose you’ve used something like git bisect to isolate the source commit of a bug. If you’re lucky, the underlying problem is obvious and you can fix it immediately. More often than not, things aren’t so simple; the bug-causing code might be necessary for another feature, or does not make sense as the source of the error you’re seeing. You need to understand why the code was written, and to do that, you can again use the commit history to investigate.

There are two main tools to help you search through commits: git blame and git log.

git blame annotates the file with the commit that last changed it:

$ git blame -s my-file.py
abd52642da46 my-file.py 1) import os
603ab927a0dd oldname.py 3) import re
603ab927a0dd oldname.py 4)
603ab927a0dd oldname.py 5) print(“Hello world”)
abd52642da46 my-file.py 5) print(os.stat(“README”))

This can be particularly helpful in finding which commits modify the same area of code, which you can then read to determine if they interact poorly.

For a more generalized commit search, you can use git log. In its simplest form, git log will display a list of commits in reverse chronological order, starting at HEAD:

$ git log --oneline
09823ba09de1 README.md: update project title
abd52642da46 my-file.py: add README stat printout
7392d7dbb9ae my-file.py: rename from oldname.py
5ad823d1bc48 test.py: commonize test setup
603ab927a0dd oldname.py: create printout script
...

The displayed list of commits can also be filtered by file(s), by function name, by line range in a file, by commit message text, and more. As with git blame, these filtered lists of commits can help you build a complete mental model of the changes that comprise a particular file or function, ultimately guiding you to the root cause of your bug.

Final thoughts

Although subjective and sometimes difficult to quantify, commit quality can make a massive difference in developer quality-of-life on any project: old or new, large or small, open- or closed-source. To make commit refinement part of your own development process, some guidelines to follow are:

  1. Organize your commits into a narrative.
  2. Make each commit both small and atomic.
  3. Explain the “what” and “why” of your change in the commit message.

These guidelines, as well as their practical applications, demonstrate how powerful commits can be when used to contextualize code. Regardless of what you do with them, commits will tell your project’s story; these strategies will help you make it a good one.

Additional Resources

githubengineering githubengineering githubengineering

« A Package for Onboarding Users in Laravel Apps - Encrypting Laravel Eloquent models with CipherSweet »