Web App JavaScript Crash Reporting

We just launched Popcorn Maker 1.0 this weekend at Mozilla Festival 2012, and while the focus of the app is obviously web video remixing, one of our favourite features as a dev team was the JavaScript Crash Reporter.  Bobby and I wrote a little bit about it in our Mozilla Hacks post, but I wanted to say some more about the potential for this technique.

Working on Firefox code, I've long been a fan of the Mozilla Crash Reporter, which uses Google's crash reporting system, Breakpad, and the Soccoro server.  Typing about:crashes in Firefox's address bar will show you any crashes you've logged, which are links to crash-stats.mozilla.com.  Here's an example.  When you're lucky you get a stack showing where things went wrong, info about bugs related to this crash, aggregated data for other users' crashes in the same spot, etc.

For developers it's an incredible source of information, since often these crashes occur out in the wild, and under circumstances the developers can't reproduce.  Crash reporting in native apps isn't new, of course.  OS and application vendors routinely use this data-driven debugging method to gather useful data about edge cases in their code, and to make corrections.

The technique isn't used as often in JavaScript applications, though.  More often than not, a user's contribution to the data of a web app is about marketing or tracking vs. improving development and stability.  Another reason is that many pieces of JavaScript are libraries vs. apps, and get used on more than one installation, making it undesirable that code phone-home to report an issue.

With Popcorn Maker we wanted to make it easy for our users to give feedback and report issues.  Because the project isn't for developers, asking that they find and file bugs in our issue tracker wasn't realistic.  So we made a Feedback button and form, and also an automated Crash Reporter to deal with top-level exceptions (i.e., those that can be caught using window.onerror).

We've been using these techniques now for about about a month, since the beta launch for Ryan's TED talk on Popcorn, and through the 1.0 launch, and I can now talk about what it's like using them.  First, and perhaps not surprisingly, people don't tend to leave much feedback.  It takes time to write free-form comments in a form, and most people don't want to bother.  So far we've gotten between 3 and 6 a day.  The quality of what people say is usually quite high, though.  When they have bothered to do the work of filling out the form, they've taken care to do it well, and usually give us ideas for new features.

The crash reports are another story.  This morning when I sat down to check my email, I had over 50 waiting for me.  I was able to file 8 new bugs (many of them were the same issue hit by different people).  I was also able to see that we've regressed one bug we thought we'd killed completely, that we're having issues in the Sogou and Zune/Media Centre browsers, that we have a timezone related bug in some of our date parsing, and that I finally have better data about a crash we first saw last week (i.e., different browsers report the error differently, giving us a more complete picture of what is actually failing). That's data from just a single day!

I can't emphasize how drastically this technique has changed our development practices.  In the period between 0.9 and 1.0 we found and fixed 12 bugs that our team couldn't reproduce (there are another 12 on file that we're still trying to fix).  We put a lot of effort into automated and manual testing, and work hard on cross-browser testing, but there is no way for us to hit all these issues.  Furthermore, there's no way people would file all these issues.  What would they say?  "It didn't work" isn't helpful for them or us.  We need to know what failed so it doesn't happen again.  Rather than having a web app that just silently fails, we give our users the chance to become part of the development process by submitting the crash data, and a lot of them do:

You can try simulating a crash by going to http://popcorn-dev.rekambew.org/templates/basic/ and doing Butter.app.simulateError() in your console.  If the user wants to add extra info, they can (most don't), and then if they click 'Yes' we send anonymous data about the error.  For example:

  • Which version of our code they were using
  • Which media URL(s) they tried
  • Data from the onerror handler--message, filename, line number
  • States the app had hit before the crash (i.e., which events we'd processed internally, like mediaready)
  • Any DOM nodes that we tried to get and were null
    We take that data and store a JSON report on Amazon S3.  Here is an example:
{  
    "date": "Tue Nov 13 2012",  
    "message": "TypeError: 'undefined' is not a function (evaluating 'tracksContainer.container.getBoundingClientRect()')",  
    "appUrl": "https://popcorn.webmaker.org/templates/basic/?savedDataUrl=ted.json",  
    "scriptUrl": "https://popcorn.webmaker.org/src/butter.js",  
    "lineno": 9805,  
    "mediaUrl": [  
        "https://www.youtube.com/watch?v=0g2WE1qXiKM&butteruid=1352812826605"  
    ],  
    "stateList": [  
        "trackeventadded",  
        "trackeventadded",  
        "trackeventadded",  
        "trackadded",  
        "trackeventadded",  
        "trackadded",  
        "trackeventadded",  
        "trackeventadded",  
        "ready",  
        "mediaready"  
    ],  
    "userAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/536.26.17 (KHTML, like Gecko) Version/6.0.2 Safari/536.26.17",  
    "popcornVersion": "v1.3-44-g05ac7ef",  
    "butterVersion": "v1.0.4",  
    "nullDomNodes": ".media-status-container, .youtube-subscribe-embed, [data-tooltip]",  
    "comments": ""  
}

There are a few gotchas to be aware of:

  • You need to filter your errors by filename so you don't crash when trivial things happen in scripts you include.  In our case, we use a lot of third-party APIs which can spam the console with errors that aren't actually a problem--the YouTube iframe tries to read values in the parent document and fails with cross-origin errors.
  • Firefox has an annoying bug where errors in simulated events get eaten, and don't make it up to the top-level, see https://bugzilla.mozilla.org/show_bug.cgi?id=503244
    There are also other things I'd love to figure out how to do better:

  • How should we deal with minified code?  For now we've left our code unminified in order to get better data about what the failure was (e..g, function/var names vs munged names), and real line numbers.  It would be great to do something with source maps or the like

  • How to get a stack?  By the time onerror happens, the stack is unwound, and we don't have good data about the full issue.  Sometimes the top-level error message is useful, sometimes it isn't.  Knowing what happened before that would be great.  I've thought about writing wrappers around our major module blocks to catch exceptions, cache the error (and stack if there), then rethrow.  The crash reporter could look up the last error(s) in the cache and add that data.
  • It would be great to add some kind of assert API to the reporter.  Imagine if you could assert not to the console, but to the reporter, and if there's a crash, send along failed assertion data.
    I'd also love to not have to build a full UI for working with the reports :)  I think it would be great if web app developers could use Mozilla's Socorro infrastructure to gather and analyze crashes.  Imagine if part of doing a Mozilla Market Place app was getting access to that infrastructure?  Chris Lonnen and I discussed this idea at the Festival.

At some point I'd like to rip the crash reporter out of Popcorn Maker and turn it into a stand-alone repo others can use.  The code is here, if you're interested in seeing how we do it.  I'd highly encourage other web app developers to go this route.  It's been a game changer for our team and app.