Teaching Open Source, Fall 2021

As I often do at the end of a term of teaching open source, I like to record some details about what I've observed and learned.  This was the 16th time I've taught the course, and as always, the experience was unique, despite also being very familiar.

I've been doing open source in an academic setting long enough that I feel comfortable counting on open source contributions to produce a particular set of learning outcomes.  My experience this term has reinforced my belief that with proper support, open source is the best way to rapidly accelerate the growth, skill development, and confidence of new developers.

The course runs in two parallel tracks: 1) contributing between 7-10 pull requests to existing open source repositories, slowly progressing in complexity and scope; and 2) completing a series of weekly labs to supplement and teach the required skills for 1).

Labs

For the second year in a row, I've approached the "getting started" problem by adding a series of labs that simulate the workflows and community interactions of real open source contributions.  In the past I relied on real contribution experiences to provide this knowledge, but it was often unevenly acquired between students.  The labs provide ready-made experiences (managing a repo, creating PRs, doing code reviews, dealing with complex git workflows, writing tests, CI pipelines).

I have the students write a small CLI program and incrementally enhance, refactor, test, and fix it over the term.  Last year it was a deadlink checker, and this year a static site generator.  It doesn't really matter what the program is, as it's a means to an end (although it lets them explore a topic area in addition).  Everyone has to develop their own version, work on bugs and features in other students', and review and manage incoming changes to their own.  In doing so they get to experience a bit of what it's like being a contributor and a maintainer, and see the diversity of coding styles (different people solve the same problems in different ways) and languages.

I also let them work in any language that they want, which ended up looking like this:

Language Percent
JavaScript 55%
Python 15%
C++ 7%
C# 7%
Java 4%
TypeScript 4%
Rust 4%
Java 4%
Swift 2%

By the end the students are required to do a 1.0 version and ship a release to the most appropriate package manager for their language, be it npm, PyPI, crates.io, NuGet, conan, Maven, etc.  Many students are amazed to learn that they can create software packages for others to use.  In addition, having to go through so many dot-releases each week helps prepare the students for the reality of software maintenance.  Poor choices in week 1 come back to haunt them in week 6, other students' PRs must be reviewed and fixed, bugs have to be corrected, somehow tests need to be added, support for static analysis tooling, CI pipelines, etc.  I try to have it be as real-world as possible without being any more complex than it has to be.  It's a balancing act I don't always get right, and I changed some labs this term based on what I learned last year.

Pull Requests

As they learn how to use git, GitHub, and the common workflows of open source contribution, next we get started on making pull requests to real projects.  Here my approach is to encourage them to start small and slowly, progressively, increase the complexity of their work.

Most students do ~7 pull requests beyond their labs.  The first ones are typically typo fixes, documentation, localization, web site improvements, and other beginner-friendly tasks.  As they move through the term, so too do the contributions grow in size and complexity.  By the end, many students are contributing new features to projects, or fixing bugs in very large repositories.  No one does the same thing, since they all work on real projects.  This largely deals with the cheating problem that other professors write about during COVID and remote learning.  In fact I encourage the students to collaborate and help one another (it's actually hard to make this happen, since they have all be taught that working with peers is cheating!)

I ran some scripts on the pull requests this term and this is what I saw:

  • Students: 54
  • Total Pull Requests: 392
  • Number of PRs Merged Already: 277 (70%)
  • Commits: 1,236
  • Lines of Code: +82,032/-41,632

These numbers are interesting to compare to historical data.  In 2019 there were almost twice as many PRs from a similar number of students, but almost the same amount of code (lines of code) was affected.  From this I conclude that this term's students were doing larger changesets in each PR.  I probably need to emphasize the value of smaller PRs next time.  I was also interested to see that the number of merged PRs had increased by 7% from 2019 (from 63% to 70%), which I think speaks to quality, as assessed by the repo maintainers.

As always, one of the most interesting things to observe is where they choose to work, and in which languages.  They work on both the smallest and largest projects, and everything in between.  Some of the more notable repos this term included:

The language breakdown was interesting, too.  The majority of PRs (81%) were done in one of the following languages:

Language % (2021) % (2019) Change
JavaScript 58% 51% +7%
TypeScript 11% 3.9% +7.1%
Python 9% 9% N/A
Java 3% 7% -4%

The remaining 19% of PRs were done in one of the following: C, C++, Rust, Go, Kotlin, C#, Swift, PHP, Dart, and Astro (which I'd never heard of before).

As an aside, I've had some people question me about why I focus so much on JS/TS/node/browsers, but the reality is that this is what students want: I let them choose any technology stack, and year after year they choose the web.  The results echo the Stack Overflow 2021 Survey results, or GitHub's 2021 State of the Octoverse stats.  Always bet on JS!

A good example of this was the work that Andrew Nguyen took on for his final set of pull requests. At the start of the term he talked about wanting to make some meaningful contributions to the web ecosystem.  His final submission included:

  1. adding a feature to Vercel's SWR React hook with tests
  2. writing the docs for it, including the localizations
  3. adding a follow-up edge case test
  4. fixing a bug in the TypeScript compiler

His blog about the experience speaks to how valuable it is to give students a chance to do this work while they are still in school:

Prior to enrolling in [the course], I deeply believed that people who worked on influential repos like React, Visual Studio Code or TypeScript must be developers with years of experience, and they were all tightly involved with the codebase from the start. Who in the right mind would walk into a completely unfamiliar codebase with thousands of commits and decide to land their first commit. Obviously, I was wrong. First of all, every repo has issues that can be solved by any newcomers. Those are usually labelled as "good first issue". Secondly, the people of open source are much much friendlier than I thought. They are always ready to give insights on solving the problem. I don't have to understand the whole codebase to start helping, usually a good first issue only relate to a small feature of the app.

The truth is, students make up an enormous part of the GitHub population, and it's important for the rest of the open source community to recognize their presence and contributions.  Here's GitHub's own stats for 2021:

From https://octoverse.github.com/#sustainable-communities

In 2022, I'm excited to build on the type of work that Andrew was doing.  Namely, I want the students to understand not only how to use open source, but also how to give back in positive ways to the dependencies and tools that they rely upon every day.  With bitcoin miners invading our npm packages and the recent log4j vulnerability, the software supply chain has never been as visible or fragile as it has been this fall.  We all need to work to support the efforts to keep it healthy and secure.

As good as it was this term, I'm completely spent and ready for a break from open source, computers, and teaching.  However, I'm really happy with what we accomplished together this term.  I think this was one of the more engaged groups I've worked with for a few years, which is saying a lot for students who never met in person, and did everything remotely.

Well done everyone.  I knew you could do it, and you did it well.