Failure as contribution
Since I've got some time to kill here while I wait for just about every piece of software I need on my box to rebuild, I thought I'd reflect on what I've been doing the past few evenings. I want to try and show what I mean by "contribution" when I ask my students to contribute to other projects. At the same time, I want to show how failure can itself be an important source of contribution.
I think a lot about failure. As a professor taking students into the bowls of a web browser, you have to think about failure. Failure is what will happen. It isn't something you can avoid. It's where you spend the majority of your time when you work at the scale of something like Mozilla. As I've written elsewhere, failure is data. Failure is information on the road to getting things done. You avoid failure at the risk of avoiding success.
Tonight I'm failing:
<@taras> humph: it's just not your day today
< humph> I'll pull a teachable moment out of it for my blog/students
< humph> otherwise it's a disaster
< dwitte> humph: lol, that's unfortunate
<@taras> humph: valgrind should tell you where it's going wrong
< dwitte> start with upgrading to 4.3.latest, since i've tested on that
< dwitte> after that, bust out the debugger :)
< dwitte> or valgrind!
Luckily Dan has kindly provided some documentation on building and using his tool (it's great to see that the Mozilla static analysis tools group all seem to do this). I began by building Treehydra and quickly hit a wall. Even though it built, it couldn't be used to build Mozilla. I did some checking and noticed that I was failing a bunch of treehydra tests. I tried rebuilding a few other things to see if I'd missed something, and still couldn't get it to work. Asking on irc brought no help. After deciding it was not just something stupid I was doing, I filed a bug.
A little while later, Taras offered to look at my box. After poking around for a while, he agreed that this was not just me, and represented a bug in Mozilla's JS implementation, SpiderMonkey. He offered to write a work-around patch for me if I'd help him by finding a regression window.
Regressions are bugs where you suddenly break something that previously worked. They are accidental side-effects of making other changes, and often don't show-up for some time. When you do notice them, you want to go back and figure out where you introduced the bug. In other words, you want a window of time where it went from 'working' to 'not working'. Finding regression windows can be time consuming, and depending on the bug, laborious.
In my case I needed to try various versions of the SpiderMonkey code combined with my Treehydra build, and then run the build tests. Without version control, this would be a nightmare, as it would require downloading many builds and testing. However, with version control, and especially Mercurial (which keeps the entire history on your local machine), this is pretty easy.
Mercurial revisions are numbered two ways. First, there is a local incremental number--an integer. Second, there is a hash that represents the changeset globally, and is the same on everyone's machine. To make this job easier, I did what any sane programmer would do, and wrote a quick shell script that took one argument, the local revision to use for my update, and updated/rebuilt/tested.
Next I started making some guesses. Talking with people on irc, I guessed that this had worked sometime in the past 1-2 months. I tried going back to September and running my script. Very quickly I found a revision where all the tests passed. After that it was a matter of bisecting the revisions between my current (failing) revision and the passing one some months back. After a dozen attempts, I had it: here was the last time it had worked.
At the end of all this, I still have a broken build (it finished while I was writing this). I haven't fixed my problem yet. However, my problem turned out to be a bug that had been silently introduced a few months ago. Someone needed to fail for it to get found and fixed. Someone needed to do the work of figuring out when it got introduced. Someone also needs to fix it. But the fix only comes after it has been identified.
I tell this story so as to encourage my students (and all new contributors) to not lose heart when they fail at things. Your failure may be pointing at a much larger issue. In an open project like Mozilla, there are no personal bugs. The community owns them. You become part of that community when you contribute to finding and fixing them, and failure is how you get there. You can't contribute without failure.