Teaching Open Source, Fall 2019

Today I've completed another semester of teaching open source, and wanted to write something about what happened, experiments I tried, and what I learned.

This fall I taught the first of our two open source classes, cross-listed in our degree and diploma programs as OSD600 and DPS909. This course focuses on getting students engaged in open source development practices, and has them participate in half-a-dozen different open source projects, making an average of 10 pull requests over 14 weeks. The emphasis is on learning git, GitHub, and how to cope in large open source projects, code bases, and communities.

This is the 15th year I've taught it, and I had one of my largest groups: 60 students spread across two sections. I don't think I could cope with more than this, especially when I'm also teaching other courses at the same time.

Running the Numbers

I ran an analysis this morning, and here's some data on the work the students did:

665 Pull Requests (2,048 commits) to 266 GitHub Repositories
425 have already been merged (63%)
+85,873/-25,510 lines changed in 2,049 files

They worked on all kinds of things, big and small. I kept a list of projects and organizations I knew while I was marking this week, and some of what I saw included:

Lots from Mozilla, including: mozilla/addons-code-manager, mozilla/hubs, mozilla/web-ext, mozilla/foundation.mozilla.org, mozilla-lockwise/lockwise-android, firefox-devtools/profiler
All kinds of things with Micosoft (more than any other org), including microsoft/vscode, microsoft/vscode-vsce, microsoft/STL, microsoft/accessibility-insights-web, microsoft/calculator, microsoft/react-native-windows, microsoft/fast-dna, xtermjs/xterm.js, dotnet/winforms
Google: google/blockly, google/coding-with-chrome, angular/angular
ruby/www.ruby-lang.org
Lots of localization with GatsbyJS: gatsbyjs/gatsby-vi, gatsbyjs/gatsby-es, gatsbyjs/gatsby-ru
moment/moment
bitcoin/bitcoin
elastic/kibana, elastic/eui
facebook/create-react-app
Quite a few PRs to WordPress: wordpress-mobile/WordPress-iOS, WordPress/gutenberg
nodejs/node
A few to projects run by Nasa: nasa-gibs/worldview, nasa/PSP
home-assistant/home-assistant, home-assistant/home-assistant-polymer
zeit/hyper
uber/baseweb

Whatever they worked on, I encouraged the students to progress as they went, which I define as building on previous experience, and trying to do a bit more with each PR. "More" could mean working on a larger project, moving from a smaller "good first issue" type fix to something larger, or fixing multiple small bugs where previously they only did one. I'm interested in seeing growth.

Personal Experiences

The students find their way into lots of projects I've never heard of, or wouldn't know to recommend. By following their own passions and interests, fascinating things happen.

For example, one student fixed a bunch of things in knitcodemonkey/hexagon-quilt-map, a web app for creating quilt patterns. Another got deeply involved in the community of a service mesh project called Layer5. A few women in one of my sections got excited about Microsoft’s recently open sourced C++ Standard Library. If you'd ask me which projects students would work on, I wouldn't have guessed the STL; and yet it turned out to be a really great fit. One of the students wrote it about it in her blog:

Why do I always pick issues from Microsoft/STL? It's because of the way they explain each bug. It is very well defined, there is a description of how to reproduce the bug and files that are crashing or need updates. Another reason is that there are only 20 contributors to this project and I'm 8th by the amount of contributing (even though it is only 31 line of code). The contributors are very quick to respond if I need any help or to review my PR.

Working on projects like those I've listed above isn't easy, but has its own set of rewards, not least that it adds useful experience to the students' resume. As one student put it in his blog:

I have officially contributed to Facebook, Angular, Microsoft, Mozilla and many more projects (feels kinda nice to say it).

Another wrote:

I contribute to various repositories all over the world and my work is being accepted and appreciated! My work and I are valued by the community of the Software Developers!

And another put it this way:

The most important thing — I am now a real Open-Source Developer!

Becoming a real open source developer means dealing with real problems, too. One student put it well in his blog:

Programming is not an easy thing.

No, it isn't. Despite the positive results, if you talked to the students during the labs, you would have heard them complaining about all sorts of problems doing this work, from wasting time finding things to work on, to problems with setting up their development environments, to difficulties understanding the code. However, regardless of their complaints, most manage to get things done, and a lot do quite interesting work.

There's no doubt that having real deadlines, and a course grade to motivate them to find and finish open source work helps a lot more of them get involved than would if they were doing this on the side. The students who don't take these courses could get involved in open source, but don't tend to--a lot more people are capable of this work than realize it. They just need to put in the time.

The Devil's in the Details

I wish I could say that I've found a guaranteed strategy to get students to come to class or do their homework, but alas, I haven't. Not all students do put in the time, and for them, this can be a really frustrating and defeating process, as they find out that you can't do this sort of work last minute. They might be able to stay up all night and cram for a test, or write a half-hearted paper; but you can't fix software bugs this way. Everything that can go wrong will (especially for those using Windows), and these Yaks won't shave themselves. You need hours of uninterrupted development, patience, and time to communicate with the community.

One of the themes that kept repeating in my head this term is that work like this is all about paying attention to small details. I'm amazed when I meet students taking an advanced programming course who can't be bothered with the details of this work. I don't mean being able to answer trivia questions about the latest tech craze. Rather, I mean being attuned to the importance and interplay of software, versions, libraries, tools, operating systems, syntax, and the like. Computers aren't forgiving. They don't care how hard you try. If you aren't interested in examining software at the cellular level, it's really hard to breath life into source code.

Everything matters. Students are amazed when they have to fix their commit messages ("too long", "wrong format", "reference this issue..."); squash and rebase commits (I warned them!), fix formatting (or vice versa when their editor got too ambitious autoformatting unrelated lines); change the names of variables; add comments; remove comments; add tests; fix tests; update version numbers; avoid updating version numbers! sign CLAs; add their name to AUTHORS files; don't add their name to other files! pay attention to the failures on CI; ignore these other CI failures.

Thankfully, a lot of them do understand this, and as they travel further down the stacks of the software they use, and fix bugs deep inside massive programs, the value of small, delicate changes starts to make sense. One of my students sent me a note to explain why one of her PRs was so small. She had worked for weeks on a fix to a Google project, and in the end, all that investment of research, time, and debugging had resulted in a single line of code being changed. Luckily I don't grade submissions by counting lines of code. To me, this was a perfect fix. I tried to reassure her by pointing out all the bugs she hadn't added by including more code than necessary. Google agreed, and merged her fix.

Something New

This term I also decided to try something new. We do a bunch of case studies, looking at successful open source projects (e.g., Redis, Prettier, VSCode) and I wanted to try and build an open source project together with the whole class using as much of the same tech and processes as possible.

I always get students involved in "external" projects (projects like those mentioned above, not run by me). But by adding an "internal" project (one we run), a whole new kind of learning takes place. Rather than showing up to an existing project, submitting a pull request and then moving on to the next, having our own project meant that students had to learn what it's like to be on the other side of a pull request, to become a maintainer vs. a contributor.

I've done smaller versions of this in the past, where I let students work in groups. But to be honest it rarely works out the way I want. Most students don't have enough experience designing and implementing software to be able to build something, and especially not something with lots of opportunity for people to contribute in parallel.

Our project was an RSS/Atom blog aggregator service and frontend called Telescope. The 60 of us wrote it in a month, and I'm pleased to say that "it works!" (though it doesn't "Just Work" yet). I recorded a visualization of the development process and put it on YouTube.

A day in the life of Telescope development

I've written a lot of open source software with students before, but never this many at once. It was an interesting experience. Here are some of my take-aways:

I tried to file Issues and review PRs vs. writing code. We got to 500 Issues and PRs by the end, and I filed and reviewed hundreds of these. Most students are still learning how to file bugs, and how to think about decomposing a feature into actionable steps. As the weeks went on, they started to get the hang of it. But in the end, I had to write the first bit of code to get people to join me. I also had to be patient to let the code evolve in "interesting" ways, and have faith we'd get to where we need to be in the end (we did).

No matter what I did (lessons in class, writing docs, helping people in the lab), people managed to get their git repos in a mess. Because so many of the external open source projects the students worked on require a rebase workflow, I encouraged the same in our project. However, that didn't stop people from merging master into their branches over and over again, sometimes a dozen times in the same pull request. I still haven't figured out how to teach this perfectly. I love git, but it's still really hard for beginners to use properly. In the end I often did rebases myself to help get students out of a mess. But despite the problems, most people got the hang of it after a few weeks. One of the students put it this way in his blog: "This summer I knew very little about git/GitHub…it was pure hell. Now we are jumping between branches and rebasing commits as gracefully and confident as squirrels."

Having a mix of macOS, Linux, and Windows meant we had to spend a lot of time on environment issues. I think this is good, because it helps prepare students for the realities of modern programming. However, I didn't love fighting with Windows on so, so many minor things. Despite what I read online, I remain unconvinced that Windows is ready to be used as a primary development environment.

As we slowly added automation, tooling, and CI to the project, things got so much better. One of the students added eslint with Airbnb's style guide. This was interesting to watch, because it is very particular about spacing, and it made everyone have to adjust their work. Later, we added Prettier and a .vscode directory with default extensions and project settings for line-endings, format-on-save, etc. This fixed 99% of the issues we'd had previously, though we still needed a .gitattributes fix for Windows line endings to make git happy in Windows CI.

We used Jest for automated testing, and most students got to work on writing tests for the first time. When I showed them how test coverage worked, a lot of them got really interested in increasing test coverage. They also had to contend with lots of interesting issues, like rate-limits in live APIs vs. using mocks for network requests, dealing with test isolation and ordering issues, environment issues in CI that didn't manifest locally, and learning to write code defensively.

We used Docker and docker-compose to start our various services. This was new for almost everyone, and quite a few students got really interested in how containers work. Again, Windows made this an extremely frustrating experience for many students, who could never get it to run properly, and we had to work around that, too.

We tried to implement various pieces of the 12 Factor approach, a lot of which were new to them. For example, using environment variables. Over the weeks they learned how various things can go wrong, like having authentication tokens slip into git. It's invaluable learning, and making mistakes is how you do it.

I started with package-lock.json in git, and eventually removed it and went to exact versioning in package.json. The amount of churn it caused with this many people was unworkable, especially with people learning git and npm at the same time.

The pace of development was frenetic. For some people, this was a plus, and it got them excited. For others, it was a turn off, and helped to further isolate them when they couldn't get their environment to work, or when they had to continually rebase branches that couldn't get merged fast enough. It was imperative that I be there to help keep things moving, jump in with tricky rebases and merges, and help to finish certain problematic fixes. For this to work, you really need a few "senior" people who can help keep things moving.

Despite what I wrote above, students will not ask for help. Not in person. Not on Slack. Not in issues. It's not a tooling problem. By putting them all together in one place, the natural tendency to not want to look stupid gets multiplied, and people become less collaborative, rather than more. They want to show-up with something finished vs. in process and broken. I set-up a Slack channel, and it was mostly empty. However, a whole bunch of private chat rooms and Discords sprouted in parallel. It's hard to know how to make this work perfectly. I try to model "not knowing" and asking questions, but it takes a long time to become comfortable working this way. I know I don't know everything, and that's OK. Students are still figuring out how they feel about this.

Students reviewing each other's (and my) code is really valuable. I made all the students Collaborators, and set GitHub so that you needed 2 approvals to merge any code on master. Sometimes I used my admin rights to merge something fast, but for the most part this worked really well. One of the students commented in his blog how much he'd learned just by reviewing everyone's code. We also talked a lot about how to write changes so they'd be more reviewable, and this got better over time, especially as we studied Google's Code Review process. It's a real skill to know how to do a review, and unless you get to practice it, it's hard to make much progress. We also got to see the fallout of bad reviews, which let test failures land, or which removed code that they should not have.

In the winter term I'm going to continue working on the project in the second course, and take them through the process of shipping it into production. Moving from fixing a bug to fixing enough bugs in order to ship is yet another experience they need to have. If you want to join us, you'd be welcome to do so.

All in all, it was a busy term, and I'm ready for a break. I continue to be incredibly grateful to all the projects and maintainers who reviewed code and worked with my students this time. Thank you to all of you. I couldn't do any of this if you weren't so open and willing for us to show up in your projects.