Experiments with audio, part II

I'm working on a project to try and expose audio spectrum data from Firefox's audio element. Today I document some of my background research and initial first steps at locating this data.

The <audio> element is part of the html5 spec, and you can already use it in Firefox 3.5 and above. Here's a page that uses it, for example. If we go spelunking through the Firefox code, we can see that <audio>, and the bits needed for these experiments, are spread across a number of files.

First, and perhaps least interesting, is the IDL definition of nsIDOMHTMLAudioElement, which inherits everything from nsIDOMHTMLMediaElement (note: if you are unfamiliar with IDL, and Mozilla's use of it, I've written about how we use it here). This tells us the sorts of things one can do with an audio element, which attributes or functions it has. One important thing to note, for purposes of this experiment, is the presence of Mozilla-specific additions to this list, namely, mozAutoPlayEnabled and mozLoadFrom. Similar to Mozilla's CSS additions, which are named -moz*, we'll have to name our additional features for audio with a moz* prefix.

Next we see the actual implementation of the nsIDOMHTMLAudioElement interface in nsHTMLAudioElement.cpp. Here are the aspects of the audio element relating to "content" or the DOM. Just as with the IDL, most of the functionality we care about is really in the nsHTMLMediaElement.cpp instead (e.g., here is the code for mozAutoPlayEnabled).

Reading through this code, I'm interested to find some sort of connection to the underlying sound device, and the location where spectrum data is dealt with. I notice numerous references to objects of type nsMediaStream, which sounds interesting. This directs my attention to the content/media/ directory, and I notice nsAudioStream.cpp, which includes a Write method.

I decide to try adding some debugging info and see what the data being written down to the audio handle looks like. The code seems to imply that I'm in the right spot, as data is being written down to the audio stream in a series of shorts. Using the same audio test page I linked to above, I run my modified build, and here is what I see:

[43 43 -1063 -1063 -70 -70 1350 1350 658 658 -414 -414 -476 -476 -1631 -1631 -3145 -3145 -2758 -2758 -1988 -1988 -2282 -2282 -2488 -2488 -2204 -2204 -2371 -2371 -1514 -1514 -230 -230 243 243 1950 1950 2835 2835 2398 2398 3106 3106 3970 3970 3281 3281 1736 1736 992 992 1482 1482 2162 2162 1552 1552 681 681 -69 -69 -835 -835 -1182 -1182 -884 -884 267 267 703 703 140 140 -187 -187 238 238 86 86 -724 -724 -1004 -1004 -683 -683 226 226 412 412 -366 -366 -2132 -2132 -3578 -3578 -3118 -3118 -2473 -2473 -1405 -1405 -1053 -1053 -2695 -2695 -3086 -3086 -2081 -2081 -1906 -1906 -1992 -1992 -2153 -2153 -2688 -2688 -2127 -2127 -1160 -1160 -409 -409 718 718 676 676 248 248 553 553 275 275 -105 -105 327 327 586 586 1110 1110 2734 2734 3373 3373 1627 1627 673 673 1353 1353 944 944 1712 1712 3272 3272 2751 2751 1092 1092 -116 -116 223 223 275 275 -1367 -1367 -2829 -2829 -837 -837 1115 1115 -401 -401 -437 -437 ...]

Looks just like the song sounds, doesn't it? I have no idea, but I've sent my patch and the data off to my audiophile colleagues for analysis. Maybe we're on the right track, maybe I'm just pulling numbers out of the air. At any rate, I'm learning more about this code. Next time I hope to do a bit of work to get this data exposed in a more usable way (right not it's just a console dump). That will likely be the subject of part III.