Experiments with audio, part X

I’m working with an ever growing group of web, audio, and Mozilla developers on a project to expose audio data to JavaScript from Firefox’s audio and video elements. Today we show you how much JavaScript can really do.

Since my last post, quite a few new people have joined our group, a lot has changed in our implementation, and we've achieved a few things worth writing about. I also can't keep these demos under wraps any longer, so it's time for another post.

One of the first pieces of advice I got in the bug, when I started writing this patch to expose audio data in Firefox, was to use Vlad's new typed arrays (aka WebGL Arrays). My first implementation used an array-like object to expose the audio data, and JS arrays for writing samples. Both worked well, but neither was as fast as we'd like, and it meant various hacks to work around performance issues. Vlad was kind enough to give me a crash course on how to implement them via quickstubs, and over the past few weeks, Yury Delendik and I have worked long hours to rewrite our entire implementation to use them.

Along with the suggestion to use typed arrays also came a less welcome suggestion: remove the FFT calculation from C++ and do it all in JavaScript. When I suggested this in our #audio irc channel, a lot of people were upset, saying that this was a bad idea, that we'd never be fast enough, etc. However, I pulled it out anyway in order to force them to try. Corban responded by rewriting his dsp.js library to use Float32Array, which can now do 1000 FFTs on a 2 channel * 2048 sample buffer in 490ms, or 0.49ms per fft (js arrays take 2.577ms per fft, so a lot faster!). And one of the biggest critics of my decision to pull the native FFT, Charles Cliffe, went off to prove me wrong, but ended up with two stunning WebGL based audio visualizations (demos here and here, videos here and here).

What I like most about these (other than the fact that he's written the music, js libs, and demo) is that these combine a whole bunch of JavaScript libraries: dsp.js, cubicvr.js and beatdetection.js, and processing.js. Some people will tell you that doing anything complex in a browser is going to be slow; but Charles is masterfully proving that you can do many, many things at once and the browser can keep pace.

Corban and Ricard Marxer have been busy exploring how far we can push audio write, and managed to also produce some amazing demos. The first is by Ricard, and is a graphic equalizer (video is here):

The second is by Corban, and shows a JavaScript based audio sampler. His code can loop forward or backward, change playback speed, etc. (video is here):

Chris McCormick has been working on porting Pure Data to JavaScript, and already has some basic components built. Here's one that combines processing.js and webpd (video is here):

I think that my favourite demo by far this time around is one that I've been waiting to see since we first began these experiments. I've written in the past that our work could be used to solve many web accessibility problems. A few weeks ago I mentioned on irc that someone should take a shot at building a text to speech engine in JavaScript, now that we have typed arrays. Yury quietly went off and built one based on the flite engine. When you run this, remember that you're watching a browser speak with no plugins of any kind. This is all done in JavaScript (demo is here, video is here):

In order to do this he had to overcome some interesting problems, for example, how to load large binary voice databases into the page. The straightforward approach of using a JS array was brittle, with JS sometimes running out of stack space trying to initialize the array. After trying various obvious ways, Yury decided to use the web to his advantage, and pushed the binary data into a PNG, then loaded it into a canvas, where getImageData allows him to access the bytes very quickly, using another typed array. The browser takes care of downloading and re-inflating the data automatically. Here's what the database looks like:

What began as a series of experiments by a small group of strangers, has now turned into something much larger. Our community continues to grow, and the scope and scale of the projects being done on our API is increasing. At the same time, through the work of Doug Schepers and Chris Blizzard, we've managed to get the attention of the W3C, which have now started an Audio Incubator Working Group to look at how to standardize this stuff. One of my colleagues in these experiments, Al MacDonald, has been asked to chair the group, which already has members from Mozilla, Google, and the BBC. You can get involved and follow @AudioXG for updates.

If you'd like to stay connected to this work, you can join this bug, where I'll be posting a patch for review in the next week or so (current patch is here). You can see our Audio Data API documentation, with tutorials and examples (this was recently completely rewritten, if you've looked at it before). You can also grab builds there, which I'm making right now and will be done in the next few hours.