Teaching Open Source: Sept 2018 - April 2019

Today I submitted my grades and completed another year of teaching.  I've spent the past few weeks marking student projects non-stop, which has included reading a lot of pull requests in my various open source courses.

As a way to keep myself sane while I marked, I wrote some code to do analysis of all the pull requests my students worked on during the fall and winter terms.  I've been teaching open source courses at Seneca since 2005, and I've always wanted to do this.  Now that I've got all of my students contributing to projects on GitHub, it's become much easier to collect this info via the amazing GitHub API.

I never teach these classes exactly the same way twice.  Some years I've had everyone work on different parts of a larger project, for example, implementing features in Firefox, or working on specific web tooling.  This year I took a different approach, and let each student be self-directed, giving them more freedom to choose whatever open source projects they wanted.  Having done so, I wanted to better understand the results, and what lessons I could draw for subsequent years.

GitHub Analysis

To begin, here are some of the numbers:

104 students participated in my open source courses from Sept 2018 to April 2019, some taking both the first and second courses I teach.

Together, these students made 1,014 Pull Requests to 308 Repositories.  The average number of PRs per student was 9.66 (mode=12, median=10, max=22).  Here's a breakdown of what happened with these PRs:

Total Percent
Merged 606 60%
Still Open 258 25%
Closed (Unmerged) 128 13%
Deleted (By Student) 22 2%

I was glad to see so many get merged, and so few get closed without full resolution.  There are lots of projects that are slow to respond to PRs, or never respond.  But the majority of the "Still Open" PRs are those that were completed in the last few weeks.

Next, I was really interested to see which languages the students would choose.  All of the students are in their final semesters of a programming diploma or degree, and have learned half-a-dozen programming languages by now.  What did they choose?  Here's the top of the list:

Language Total (PRs) Percent
JavaScript 508 51%
Python 85 9%
C++ 79 8%
Java 68 7%
TypeScript 39 3.9%
C# 34 3.4%
Swift 28 2.8%
Other 152 14.9%

In some ways, no surprises here. This list mirrors other similar lists I've seen around the web.  I suspect the sway toward JS also reflects my own influence in the courses, since I teach using a lot of case studies of JS projects.

The "Other" category is interesting to me.  Many students purposely chose to work in languages they didn't know in order to try something new (I love and encourage this, by the way).  Among these languages I saw PRs in all of:

Rust (16), Go (13), Kotlin (11), PHP (5), Ruby (3), Clojure (3), Lua (2), as well as Scala, F#, Dart, PowerShell, Assembly, GDScript, FreeMaker, and Vim script.

Over the weeks and months, my goal for the students is that they would progress, working on larger and more significant projects and pull requests.  I don't define progress or significant in absolute terms, since each student is at a different place when they arrive in the course, and progress can mean something quite different depending on your starting point.  That said, I was interested to see the kinds of "larger" and more "complex" projects the students chose.  Some examples of more recognizable projects and repos I saw:

They worked on editors (Notepad++, Neovim), blogging platforms (Ghost, WordPress), compilers (emscripten), blockchain, game engines, email apps, online books, linting tools, mapping tools, terminals, web front-end toolkits, and just about everything you can think of, including lots of things I wouldn't have thought of (or recommended!).

They also did lots and lots of very small contributions: typos, dead code removal, translation and localization, "good first issue," "help wanted," "hacktoberfest."  I saw everything.

Stories from Student Blogs

Along the way I had them write blog posts, and reflect on what they were learning, what was working and what wasn't, and how they felt.  Like all students, many do the bare minimum to meet this requirement; but some understand the power and reach of a good blog post.  I read some great ones this term.  Here are just a few stories of the many I enjoyed watching unfold.

I.

Julia pushed herself to work on a lot of different projects, from VSCode to Mozilla's Voice-Web to nodejs.  Of her contribution to node, she writes:

I think one of my proudest contributions to date was for Node.js. This is something I never would have imagined contributing to even just a  year ago.

We talk a lot, and openly, about imposter syndrome.  Open source gives students a chance to prove to themselves, and the world, that they are indeed capable of working at a high level.  Open source is hard, and when you're able to do it, and your work gets merged, it's very affirming on a personal level.  I love to see students realize they do in fact have something to contribute, that maybe they do belong.

Having gained this confidence working on node, Julia went on to really find her stride working within Microsoft's Fast-DNA project, fixing half-a-dozen issues during the winter term:

I’ve gotten to work with a team that seems dedicated to a strong  development process and code quality, which in turn helps me build good  habits when writing code.

Open source takes students out of the confines and limitations of a traditional academic project, and lets them work with professionals in industry, learning how they work, and how to build world-class software.

II.

Alexander was really keen to learn more about Python, ML, and data science.  In the fall he discovered the data analysis library Pandas, and slowly began learning how to contribute to the project.  At first he focused on bugs related to documentation and linting, which led to him learning how their extensive unit tests worked.  I think he was a bit surprised to discover just how much his experience with the unit tests would help him move forward to fixing bugs:

In the beginning, I had almost no idea what any of the functions did and I would get lost navigating through the directories when searching for something. Solving linting errors was a great start for me and was also challenging enough due to my lack of knowledge in open  source and the Pandas project specifically. Now I could identify where the issue originates from easily and also  write tests to ensure that the requested functionality works as expected. Solving the actual issue is still challenging because finding a solution to the actual problem requires the most time and research.  However, now I am able to solve real code problems in Pandas, which I  would not be able to do when I started. I'm proud of my progress...

Open source development tends to favour many small, incremental improvements vs. big changes, and this maps well to the best way for students to build confidence and learn: bit at a time, small steps on the road of discovery.

One of the many Pandas APIs that Alexander worked on was the dropna() function.  He fixed a number of bugs related to its implementation. Why bother fixing dropna?  With the recent black hole imaging announcement, I noticed that source code for the project was put on GitHub.  Within that code I thought it was interesting to discover Pandas and dropna() being used, and further, that it had been commented out due to a bug.  Was this fixed by one of Alexander's recent PRs?  Hard to say, but regardless, lots of future  scientists and researchers will benefit from his work to fix these bugs.  Software maintenance is rewarding work.

Over and over again during the year, I heard students discuss how surprised they were to find bugs in big software projects.  If you've been working on software for a long time, you know that all software has bugs.  But when you're new, it feels like only you make mistakes.

In the course I emphasize the importance of software maintenance, the value of fixing or removing existing code vs. always adding new features.  Alexander spent all his time maintaining Pandas and functions like dropna(), and I think it's an ideal way for students to get involved in the software stack.

III.

Volodymyr was interested to gain more experience developing for companies and projects he could put on his resume after he graduates.  Through the fall and winter he contributed to lots of big projects: Firefox, Firefox Focus, Brave for iOS, VSCode, and more.  Eventually he found his favourite, Airbnb's Lona project.

With repeated success and a trail of merged PRs, Volodymyr described being able to slowly overcome his feelings of self doubt: "I wasn’t sure if I was good enough to work on these bugs."

A real turning point for him came with a tweet from Dan Abramov, announcing a project to localize the React documentation:

I develop a lot using React, and I love this library a lot. I wanted to  contribute to the React community, and it was a great opportunity to do  it, so I applied for it. Shortly after it, the repository for Ukrainian  translation was created, and I was assigned to maintain it 🤠

Over the next three months, Volodymyr took on that task of maintaining a very active and high-profile localization project (96% complete as I write this), and in so doing learned all kinds of things about what it's like to be on the other side of a pull request, this time having to do reviews, difficult merges, learning how to keep community engaged, etc.  Seeing the work ship has been very rewarding.

Open source gives students a chance to show up, to take on responsibility, and become part of the larger community.  Having the opportunity to move from being a user to a contributor to a leader is unique and special.

My Own Personal Learning

Finally, I wanted to pause for a moment to consider some of the things I learned with this iteration of the courses.  In no particular order, here are some of the thoughts I've been having over the last week:

Mentoring 100+ students across 300+ projects and 28 programming languages is daunting for me.  Students come to me every day, all day, and ask for help with broken dev environments, problems with reviewers, issues with git, etc.  I miss having a single project where everyone works together, because it allows me to be more focused and helpful.  At the same time, the diversity of what the students did speaks to the value of embracing the chaos of open source in all its many incarnations.

Related to my previous point, I've felt a real lack of stable community around a lot of the projects the students worked in.  I don't mean there aren't people working on them.  Rather, it's not always easy to find ways to plug them in.  Mailing lists are no longer hot, and irc has mostly disappeared.  GitHub Issues usually aren't the right place to discuss things that aren't laser focused on a given bug, but students need places to go and talk about tools, underlying concepts in the code, and the like.  "What about Slack?"  Some projects have it, some don't.  Those that do don't always give invitations easily.  It's a bit of a mess, and I think it's a thing that's really missing.

Open source work on a Windows machine is still unnecessarily hard.  Most of my students use Windows machines.  I think this is partly due to cost, but also many of them simply like it as an operating system.  However, trying to get them involved in open source projects on a Windows machine is usually painful.  I can't believe how much time we waste getting basic things setup, installed, and building.  Please support Windows developers in your open source projects.

When we start the course, I often ask the students which languages they like, want to learn, and feel most comfortable using.  Over and over again I'm told "C/C++".  However, looking at the stats above, C-like languages only accounted for ~15% of all pull requests.  There's a disconnect between what students tell me they want to do, and what they eventually do.  I don't fully understand this, but my suspicion is that real-world C/C++ code is much more complicated than their previous academic work.

Every project thinks they know how to do open source the right way, and yet, they all do it differently.  It's somewhat hilarious for me to watch, from my perch atop 300+ repos.  If you only contribute to a handful of projects within a small ecosystem, you can start to assume that how "we" work is how "everyone" works.  It's not.  The processes for claiming a bug, making a PR, managing commits, etc. is different in just about every project.  Lots of them expect exactly the opposite behaviour!  It's confusing for students.  It's confusing for me.  It's confusing for everyone.

It's still too hard to match new developers with bugs in open source projects.  One of my students told me, "It was easier to find a husband than a good open source project to work in!"  There are hundreds of thousands of issues on GitHub that need a developer.  You'd think that 100+ students should have no problem finding good work to do.  And yet, I still find it's overly difficult.  It's a hard problem to solve on all sides: I've been in every position, and none of them are easy.  I think students waste a lot of time looking for the "right" project and "perfect" bug, and could likely get going on lots of things that don't initially look "perfect."  Until you have experience and confidence to dive into the unknown, you tend to want to work on things you feel you can do easily.  I need to continue to help students build more of this confidence earlier.  It happens, but it's not quick.

Students don't understand the difference between apps and the technologies out of which they are made.  Tools, libraries, frameworks, test automation--there is a world of opportunity for contribution just below the surface of visible computing.  Because these areas are unknown and mysterious to students, they don't tend to gravitate to them.  I need to find ways to change this.  Whenever I hear "I want to work on Android apps..." I despair a little.

Teaching open source in 2019 has really been a proxy for teaching git and GitHub.  While I did have some students work outside GitHub, it was really rare.  As such, students need a deep understanding of git and its various workflows, so this is what I've focused on in large part.  Within days of joining a project, students are expected to be able to branch, deal with remotes, rebase, squash commits, fix commit messages, and all sorts of other intermediate to advanced things with git.  I have to move fast to get them ready in time.

Despite all the horrible examples you'll see on Twitter, the open source community has, in large part, been really welcoming and kind to the majority of my students.  I'm continually amazed how much time maintainers will take with reviews, answering questions, and helping new people get started.  It's not uncommon for one of my students to start working on a project, and all of a sudden be talking to its creator, who is patiently walking them through some setup problem. Open source isn't always a loving place (I could tell you some awful stories, too).  But the good outweighs the bad, and I'm still happy to take students there.

Conclusion

I'm ready for a break, but I've also had a lot of fun, and been inspired by many of my best students.  I'm hoping I'll be able to teach these courses again in the fall.  Until then, I'll continue to reflect on what worked and what didn't, and try to improve things next time.

In the meantime, I'll mention that I could use your support.  Doing this work is hard, and requires a lot of my time.  In the past I've had companies like Mozilla generously help me stay on track.  If you or your company would like to find ways to partner or support this work, please get in touch.  Also, if you're hiring new developers or interns, please consider hiring some of these amazing students I've been teaching.  I know they would be grateful to talk to you as well.

Thanks for your continued interest in what we're doing.  I see lots of you out there in the wild, doing reviews, commenting on pull requests, giving students a favourite on Twitter, leaving a comment in their blog.  Open source works because people take the time to help one another.

Show Comments