Introducing DXR - Bringing Static Analysis to the Web

For a while now, whenever I've had spare time, I've been "building a boat in my basement."  Today I'm pleased to bring it out and launch it:  I'd like to announce the first public release of DXR, version 0.1.

DXR is a two things.  First, it's a method for collecting type, member, statement, macro, etc. information about C++, IDL, and soon JavaScript using Mozilla's static analysis tools (e.g., Dehydra), a hacked build system, and a bunch of scripts.  Second, it's a web-based tool for mapping this information back onto source code, and allowing users to query and look-up this information.

There are already some really well-known tools for creating source code cross-references, from LXR to MXR to OpenGrok.  While these are all based on "search" and various ways of indexing text,  DXR is about reducing the need to do searches, and instead being able to look-up data.

The best way for me to describe it is to show you, so I've put together a short demo (watch on YouTube or if you're using a snazzy Firefox 3.5 beta, you can view this .ogv file directly).

I've put together a live demo of DXR at http://dxr.proximity.on.ca/dxr/ which has both mozilla-central and comm-central available.  A couple points to be aware of with this site.  First, this data is not being updated daily, and I'm using revisions 7a8502b70fdf and 7b153b079c94 respectively.  Second, it's not running on a proper server, so I don't guarantee up-time or performance.  Once the code is improved a bit we'll hopefully move it to a proper box, use something other than SQLite for the back-end, and start doing regular indexing.

Having said all that, it's quite usable.  If you're a Mozilla developer, you'll be able to navigate your way around and quickly see how this is different from what we have today.  If you're not familiar with the Mozilla code, here are some examples you can look at:

A few other things I'll point out about this release:

  • The data represents a Linux-only C++/IDL analysis (i.e., no C, no Mac- or Windows-only code).  To date this has not been done on Mac or Windows.  As such, you'll probably hit files where there seems to be incomplete data, and this is probably why.  Dehydra works on Mac, and we'll likely us cross-compiling to get a Windows analysis.
  • This code is heavily tied to eccentricities in the Mozilla code base (I can tell you first hand that our code has many strange and scary corners!).  Could you use this to index other large C++ programs?  Yes, but not without modification.  If you're interested in trying, get in touch and I'll help guide you through the process.
  • There are lots of places where we have bugs in the data.  Some of them are due to GCC bugs, some to Dehydra bugs, and lots are due to my bugs using these tools.  For example, I don't currently deal with globals, C++ fragments in IDL, etc.  If you wonder why something isn't clickable, or why you click and get an empty pop-up, "it's a bug."
  • I'm still struggling for a way to show all the data I have.  The current UI is based on my own ideas, and the feedback of a small group of Mozilla developers.  There's more we could collect and show if I could figure out how, and I'd love some UX/UI people to give a hand with that.  If you have recommendations, also let me know.
    I said earlier that DXR is two things, and the web app is just one way I can imagine using this data.  Could this data be integrated into Eclipse or Komodo or emacs?  Sure it could, and it's only waiting for you to do it.  I'd be happy to provide pointers or work with you to get the data in a form that makes this possible.  Speaking of which, you can download my SQLite databases, mozilla-central.sqlite.zip and comm-central.sqlite.zip, on their own.  The database schema is here.

I'm really happy that this is finally in a state I can share with others.  My next steps will be to improve the code and quality of the data.  I'd also like to start adding other data: anything that can be mapped to source code lines and tokens is a potential candidate (performance data, bug info, documentation links, hooks for rewriting tools, etc.).  Maybe you've got ideas and would like to get involved.  I look forward to hearing from you.

The best way to get in touch is either via email, the Mozilla Static-Analysis mailing list, or on irc (I'm humph on moznet and can be found in #static, among other places).