After yesterday's experiments with the dehydra js api, I decided to test my theory and have the dehydra scripts accumulate their findings in a sqlite database. I've never worked with sqlite in the past, but I'm a believer now.
I decided to focus on type, inheritance, and member data, and wrote scripts to pull this from the tree, and then create .sql files in order to populate a database with the info. For purposes of testing my hypothesis, I cheated and wrote a script to build just the bits I know I'd need based on Dave's example. I had to tweak things so that duplicate rows get thrown away, and in the end I can get very close:
sqlite> select mtname,mloc from members where mname = "SetParent(nsIAccessible*)"; nsPIAccessible|../../../dist/include/accessibility/nsPIAccessible.h:39 nsAccessible|/home/dave/mozilla-central/src/accessible/src/base/nsAccessible.h:111 nsHTMLListBulletAccessible|/home/dave/mozilla-central/src/accessible/src/html/nsHTMLTextAccessible.h:110
After I got this running, Dave emailed to confirm that I'm on the right track (hurray for not having to throw this all out!). I'm still struggling to figure out how to best deal with the inheritance tree, since my current method gives me one level of inheritance, but in the case above, I really need multiple levels.
I have yet to do a complete build/db import, and I fear that it might be too much for sqlite (nsAccessible.cpp + nsHTMImageAccessible + nsHTMLTextAccessible ~= 2.7M db file). However, database back-end notwithstanding, I think I'm getting close to being able to do this.
UPDATE: I kicked-off a build of the whole tree, which ends up producing 1,857 .sql files containing insert statements for type, member, and inheritance info. After running these scripts (109 minutes! due to duplicate rows being brute forced), I'm left with a 46M db file representing the entire tree. I can do the query above in 1.8 seconds, or count all classes in 0.009 seconds. Not bad, not bad at all.