Learning to git bisect
Yesterday one of my students hit a bug in Brackets. We're working on an extension for Thimble that adds an autocomplete option to take a selfie when you are typing a URL that might be an image (e.g., <img src="...">
). It also needs to work in CSS, when you enter something like background-image: url(...)
. Except it doesn't work in the latter case. I advised him to file a bug in the Brackets repo, and the response was, "This used to work, must be a regression."
I told my student he should bisect and find the commit that caused the regression (i.e., code change that introduced the bug). This was a new idea for him, so I did it with him, and promised I'd write something later to walk through the process. Now that it's "later," I'm writing a short walkthrough on what I did.
As I write this, there are 16,064 commits in the adobe/brackets repo on github. That's a lot of coming and going, where a bug could easily hitch a ride and quietly find its way into the product. How does one hope to find what is likely a tiny needle in such a huge haystack? The answer is git bisect.
Having a large number of commits in which to locate a bad commit is only difficult if we try and manually deal with those commits. If I remembered that this code worked yesterday, I might quickly look through the diffs for everything that landed recently, and figure things out that way. But when we're talking about months, or maybe years since this last worked, that strategy won't be efficient.
Luckily git is designed to help us here. Because git knows about every commit, both what changed, and what change(s) came before, we can take a huge range of commits and slice and dice them in order to expose the first bad one.
In order for this process to work, you need a reproducible test case. In an ideal world, this is a unit test that's in your git history, and is stable across all the commits you'll be checking. In the real world you often have to write one yourself, or get a set of steps that quickly exposes the bug.
For the bug I was tracking, I had a simple way to test in the editor manually. When the code works, it should give us a list of filenames to use for the URL, and look like this:
When it fails, it keeps giving the list of completions for background-image
instead, and looks like this:
Now that I have a simple way to confirm whether or not the bug is present in a particular version of the code, I need to quickly eliminate commits and narrow down exactly where it came from.
You begin a bisect session with git by doing: git bisect start
. In order to have git bisect for me, I need to create a range, and to do this I need two end points (i.e., two commits): one where the bug is not present (good commit); and one where I know it's broken (bad commit). Finding a bad commit is usually pretty easy, since you already know you have the bug--in my case I can use master
. I tell git this is the bad commit by doing git bisect bad
(note: I was sitting on the master
branch when I typed this. You can also explicitly give a commit or branch/tag).
But for the last-known-good commit, I obviously don't know exactly where it happened. As a result, I'm going to need to overshoot and go back far enough to get to something that works.
Brackets is currently preparing for version 1.4 on master
, and there are 80 previous released (i.e., tagged) versions. These tagged versions are useful for quickly jumping back in time, since they represent versions of the editor that are most likely to run (i.e,. they didn't tag and release broken commits). So I start checking out old releases: 1.1 (broken), 1.0 (broken), release 0.44 (broken). It looks like this is an old problem, so I jump further back so as to not waste my time testing too many.
Eventually I checkout version 0.32 from Oct 4, 2013, and the bug isn't there. Now I can tell git about the other end of my bisect range by doing: git bisect good
.
Now git can do its magic. It will take my good and bad commits, and checkout a commit half way between them. It looks like this:
Bisecting: 3123 revisions left to test after this (roughly 12 steps)
[940fb105ecde14c7b5aab5191ec14e766e136648] Make the window the holder
A couple of things to notice. First, git has checked out commit 940fb105ecde14c7b5aab5191ec14e766e136648 automatically. It has let me know that there are 12 steps left before it can narrow down the problematic commit. That's not too bad, given that I'm trying to find one bad commit in thousands from the past two years!
At this point I need to run the editor for commit 940fb105ecde14c7b5aab5191ec14e766e136648 and test to see if it has the bug. If it does, I type git bisect bad
. If it doesn't, I type git bisect good
. In this case the bug is there, and I enter git bisect bad
. Git responds:
Bisecting: 1558 revisions left to test after this (roughly 11 steps)
[829d231440e7fa0399f8e12ef031ee3fbd268c79] Merge branch 'master' into PreferencesModel
A new commit has been checked out, eliminating half of the previous commits in the range (was 3132, now it's 1558), and there are 11 steps to go. This process continues, sometimes the bug is there, sometimes it isn't. After about 5 minutes I get to the first bad commit, and git shows me this:
6d638b2049d6e88cacbc7e0c4b2ba8fa3ca3c6f9 is the first bad commit
commit 6d638b2049d6e88cacbc7e0c4b2ba8fa3ca3c6f9
Author: <Name of Author>
Date: Mon Apr 7 15:23:47 2014 -0700
fix css value hints
:040000 040000 547987939b1271697d186c73533e044209169f3b 499abf3233f1316f75a83bf00acbb2955b777262 M src
Now I know which commit caused the bug to start happening. Neither git nor I know why this commit did what it did, but it doesn't matter. We also know what changed, which bug was being fixed, who did it, and when they did it. Armed with this info we can go talk to people on irc, and add notes to a few bugs: the bug where the bad commit was added (this will alter people who know about the code, and could more easily fix it), and our bug where we've filed the issue. Often you don't need to solve the bug, just find it, and let the person who knows the code well help with a fix.
The last step is to tell git that we're done bisecting: git bisect reset
. This takes us back to the commit we were on before we started our bisect.
Despite being quite skilled with git, I don't think that any of my students had used bisect before. Lots of people haven't. It's worth knowing how to do it for times like this when you need to quickly narrow down a regression.