Edge Cases

Yesterday I was looking at a bug with some students. It related to a problem in a file search feature: a user found that having a folder-name surrounded with braces (e.g., {name}) meant that no search results were ever reported within the folder. "This isn't a bug," one of them told me, "because no one would ever do this." I found this fascinating on a number of levels, not least because someone had in fact done it, and even gone so far as to file the bug.

I love edge cases. I'm pretty sure it started during the years I worked on Processing.js--anyone who has worked in projects with comprehensive test suites probably knows what I mean. With that project we had a massive set of existing Java-based Processing code that we needed to make work on the web. Anything that was possible in the Java implementation needed to work 1:1 in our JavaScript version. It was amazing the spectacular variety and extremes we'd see in how people wrote their code. Every time we'd ship something, a user would show up and tell us about some edge case we'd never seen before. We'd fix it, add tests, and repeat the cycle.

Doing this work over many years, I came to understand that growth happens at a project's edge rather than in the middle: the code we wanted to work on (some pet optimization or refactor) was rarely where we needed to be doing. Rather, the most productive space we could occupy was at the unexplored edges of our territory. And to do this, to really know what was "out there," we needed help; there was too much we couldn't see, we needed people to stumble across the uncultivated border and tell us what needed attention.

The thing about an edge case bug is that its seemingly trivial nature often points to something more fundamental: a total lack of attention to something deep in your system. The assumptions underpinning your entire approach, the ones you don't even know that you've made, suddenly snap into sharp focus and you're forced to face them for the first time. Even well designed programs can't avoid this: whatever you optimize for, whatever cases you design and test against, you necessarily have to leave things out in order to ship something. At which point reports of edge case bugs become an incredible tool in the fight to open your territory beyond its current borders.

It's easy to think that what I'm describing only applies to small or inexperienced teams. "Surely if you just engineered things properly from the start, you could avoid this mess." Yet big, well-funded, engineering powerhouses struggle just the same. Take Apple, for example. This week they've had to deal with yet another show-stopper bug in iOS--if you paste a few characters of text into your phone it crashes. There's a great write-up about the bug by @manishearth:

Basically, if you put this string in any system text box (and other places), it crashes that process...The original sequence is U+0C1C U+0C4D U+0C1E U+200C U+0C3E, which is a sequence of Telugu characters: the consonant ja (జ), a virama ( ్ ), the consonant nya (ఞ), a zero-width non-joiner, and the vowel aa ( ా)...then I saw that there was a sequence in Bengali that also crashed. The sequence is U+09B8 U+09CD U+09B0 U+200C U+09C1, which is the consonant “so” (স), a virama ( ্ ), the consonant “ro” (র), a ZWNJ, and vowel u ( ু).

Want to know which languages Apple's iOS engineers aren't using on a daily basis or in their tests? It's easy to point a knowing finger and say that they should know better, but I wonder how well tested your code is in similar situations? You can't believe how many bugs I find with students in my class, whose usernames and filepaths contain Chinese or other non-English characters, and software which blows up because it was only ever tested on the ASCII character set.

The world is big place, and we should all be humbled by its diversity of people and places. The software we build to model and interact with it will always be insufficient to the needs of real people. "Everyone does this..." and "No one would ever do that..." are naive statements you only make until you've seen just how terrible all code is at the edges. After you've been burned a few times, after you've been humbled and learned to embrace your own finite understanding of the world, you come to love reports of edge cases, and the opportunity they provide to grow, learn, and connect with someone new.

Show Comments