Embedding license data in images

This term I've got a good group of students taking my second open source course. Unlike the first course, which aims to teach the theory and practice of open source, the second course aims to get students engaged with a larger piece of work.

During the fall I was chatting with Ryan Merkley about some of the technical problems faced by Creative Commons. One that seemed like a good fit for the students was how to create a stronger bond between image files and a CC license. Because a CC licensed image is meant to move around the web, and get used in new contexts, it is difficult to have license data live in a web page stay connected with an image over time. I know I'm guilty of just right-clicking images and copy/pasting or saving.

What would be interesting is if the license information could ride along with the image itself. All that's needed is a simple way (i.e., a library or set of libraries) for people to read and write that license data into an arbitrary image file: maybe this happens in a camera app on Android, or maybe it's used as a web service (i.e., you upload an image, pick a license, and you get back your original image stamped with the license data internally); maybe it finds its way into browsers some day, and you can readily get at the license info just by interacting with the image in a page.

It turns out that all of the major image formats support embedded textual data, which decoders are free to ignore (and usually do): the PNG spec allows for Textual Chunks, which are key-value pairs of plain or compressed text; the JPEG spec (pdf) allows for Comment and Application Data; and GIFs allow Comments.

The goal of the project will be to create an open source library, or set of libraries, that can be used in desktop and mobile environments, and give developers a simple way to encode and decode license information from images. A number of attempts at holistic image metadata (pdf) embedding have been discussed and developed in the past. This project will be informed by these recommendations, but will focus on licensing information vs. the broad spectrum of image metadata that is possible. Over time the scope of such a library could expand to include other digital works beyond just images.

It will be interesting to see the decisions the students take in terms of implementation, since a goal of the project is that such a library be useful across mobile platforms. Should it be written in Java and then automatically converted to Objective-C and JS using j2objc and gwt as Google would, or write everything in C++ and then wrap for each platform?

There are numerous interesting theoretical and technical problems to be solved, and I look forward to digging into some of them with the students. I'll write more as we progress toward a solution.