I'm working with a group of web and Mozilla developers, along with some talented audiophiles, on a project to expose audio spectrum data to JavaScript from Firefox’s audio and video elements. Today we teach the browser how to sing.
Our experiments to expose audio data to JavaScript have really started to get exciting. Since I last posted, a number of new features and demos have been created, and our group of developers has continued to expand. This was greatly helped by the fact that the story was picked up by Ajaxian, Hacker News, and Reddit. If you'd like to help us experiment more, please get in touch.
Previously I linked to a video of a demo made by Corban Brook, in which he uses JavaScript to calculate an FFT, and then visualizes the resulting spectrum data. After this test we wanted to see how expensive these calculations were in script, so I ported his code to C++ and stuck it into the audio element's decode loop. A number of people have expressed concern over the idea of doing this in JavaScript and the cost in terms of speed and time you could spend on other work (e.g., complex real-time graphics). Therefore, I wanted to have a good way to compare speeds. I've already received feedback in the bug that native FFT calculations probably aren't warranted, and we should optimize the JavaScript case. I anticipated this, and having been involved with as much 3D in the browser as we have, I know that JavaScript is up to the challenge of doing very fast calculations. As soon as I convert over to the faster WebGL arrays I'll do some tests comparing JS-FFT and Native-FFT so we have better numbers. In the mean time, the C++ FFT is working really well, and a number of new visualizations have been done with it, including this one by Thomas Saunders.
Another thing we've got working is generating and writing audio directly from JavaScript. So far in these experiments, I've been focused on creating a real-time event with raw audio data. I did this in response to the various Flash and other audio developers who have all told me they need some way to get data as it's being played. Now that this is working well, I decided to turn my attention to what Vlad suggested in the bug, namely doing a get/set style method.
I've been stumped on the right way to do this for a while. Actually adding the method is pretty easy. What I've been trying to think through is how you deal with the linearity of audio, and the fact that unlike a canvas (which also lets you do a get/set) the audio data isn't necessarily there to get (e.g., it is yet to be downloaded/decoded). I'm sure there are good solutions to these problems, but they've eluded me thus far.
I ended-up chatting about it with Rob Campbell and Ted Mielczarek, and together we hatched a plan: what if, for the sake of experimenting, you just added a "write" method, and let JavaScript dump raw audio? Imagine an <audio> element with no source, totally controlled by JavaScript. Since I already have an event that gives data as it's decoded, it would then be possible to have one audio element decode, get the data and transform it somehow in script, then dump it into a second audio. Is this the right way? Probably not (the implementation is completely wrong no matter what). But it was simple enough to do a Friday-afternoon-evil-hack and write a patch. So I did.
I added two new methods to audio/video:
- mozSetup(channels, rate, volume) // called to create the audio stream
- mozWriteAudio(buffer[], bufferCount) // called to write buffer to the stream
Ted then wrote a simple Tone Generator, and Al MacDonald took that and wrote what I believe to be the first ever HTML document that can play scales using nothing but JavaScript (seriously, check out the source for yourself)! Here's a video of it in action:
Next time I hope to show some other demos that my colleagues have created or are creating now. It is ridiculously fun to iterate on this stuff. That's partly due to the fact that I've got such a talented group of people working with me. But it's also due to how "hackable" this code is to begin with. The fact that we can go from a back-of-the-napkin idea and then turn that into a working (if somewhat evil) patch in a couple of hours speaks to how well this stuff was written in the first place. There's a good discussion going right now about View Source, and how important it is. What we're doing in these experiments is View Source all the way down: we work in terms of HTML5 and the Open Web, but underneath is this amazing browser and platform, and underneath that the audio lib, etc. The only thing that enables this kind of experimentation and collaboration is the existence of View Source, and the kinds of communities that can form around it.