DXR pre-alpha: Source Code Lexicography

Today I hit a major milestone in a project I've been working on for more than a year.  My goal is to replace MXR with a version that includes static analysis data for the C++, IDL, and JS in the Mozilla tree.  It's a project I don't know quite what to call, but the name 'DXR' has been used by enough people that I'll probably just call it that. The general process of what I'm doing I like to call 'Source Code Lexicography,' and I'll say something about that at the end of this post.

This morning I was able to solve the "Dave Mandelin MXR Problem," and I'm so happy I had to pause to blog about it. The "Dave Mandelin MXR Problem" goes like this:

[L]ook at some random line of source code, like nsHTMLImageAccessible.cpp:238 [ed. privatePrevAccessible->SetParent(this)]. To know what this does, I want to navigate to the code for SetParent. I can click on it, but then I get a page with links to a dozen or so different SetParent functions, and I don’t know which ones can actually be called on line 238.

To get the right answer, first I have to look up the type variable, privatePrevAccessible, which turns out to be nsPIAccessible. I used MXR to do that, but I had to scan the page for my file name and then pick one of two possible variable definitions (one of which was actually not a definition). So then I can use MXR on SetParent again and search for nsPIAccessible, but I don’t find anything. I try nsPIAccessible in MXR, but I get nothing for that. It would be way too boring to read everything I did after that, but another 10-15 minutes of find, grep, and MXR to find and verify (tracing the ancestry of nsListBulletAccessible was especially tedious) that the SetParent methods that can get called are at

accessible/src/base/nsAccessible.cpp:457
accessible/src/html/nsHTMLTextAccessible.cpp:338

After I read this in March 2008, I set out to solve it.  It hasn't been easy, but I have learned an insane amount about Mozilla's classes, build system, etc.  Throughout that time I've partly measured my progress by how close I could come to solving Dave's original problem--it's been my main test case (who needs small unit tests?).  And I've solved it.

Here's what it all looks like.  DISCLAIMER: I'm not done, and this working UI is only a few days old...it all needs work.  When we go to nsHTMLImageAccessible.cpp we see this (line 194 is the one Dave was talking about in his post):

DXR Source Code View

The colours you see indicate what everything is, (macros, strings, comments, types, members, statements, etc). It's overly colourful at this point (I know I'm not alone in liking colour), because I'm still debugging, and want to see how accurate my tokenizer/markup code is. If I want to know what SetParent is in line 194, I just need to click on it:

DXR Statement Info

This sends a request back to the server, and does a lookup in the db (~1.2G of sqlite data) for more info about this call, before displaying it in a pop-up. The database "knows" what SetParent in line 194 of this file is--it knows exactly what it is (thanks, Dehydra!), and I can use that to query for other info. From this pop-up I can see that SetParent is really nsPIAccessible::SetParent, and even where it is declared in IDL (ignore the lack of links, they're coming). It gets more interesting when I click on "Implementations":

DXR Statement Implementations

Now I can see that there are exactly two calls this could be and where they are declared (I know where they are defined as well, who calls them, etc). I can get similar info about types. For example, here's nsIDOMHTMLCollection:

DXR Type Info

DXR Type Members

DXR Type Bases

DXR Type Derived

I've got all of the macros, too:

DXR Macro

Everything in the source file that means something to gcc-dehydra, xpidl, or SpiderMonkey is something we can layer on top of this view. I'm still trying to work out the best UI for showing other types of things, for example, I know "all" the paths that get you to a function (I'm still working on func pointers), caller info, declarations, etc. And there is lots more we could put inside those pop-ups, for example, graphs showing data related to a certain line of code.

There is still a lot I need to do, including making it fast (it takes ~75 mins to generate the database, and ~3-10 seconds per html file), making the data more accurate, fixing the UI, adding more functionality, merging my jshydra analysis into the build system, etc. But it's finally something I can show that isn't a bunch of database queries! I hope to have a public release in the coming month.

I said at the start of this that what I'm doing is Source Code Lexicography. I like to think of my project as one of building an OED for Mozilla. The only way I've been able to cope with Mozilla is by having access to tools like MXR, to help me figure out what things mean and how to use them. I couldn't function without it. With my education hat on, I have my eye set on a day when new people can more easily jump into working on our code with access to the wealth of knowledge DXR will provide.

If you want to try this for yourself, are interested in helping me improve this, want to be a tester and provide feedback, or otherwise get involved, please get in touch by email or on irc (humph in #static). I'm also going to be looking for a web-host home too.

The static-analysis community within Mozilla is really getting to be an exciting place. If I've proven anything doing this work, it's that you don't need to be a compiler guru to get involved.  I'm easily the "dumbest guy" there, but I'm learning a lot, and you can too.