Turning pictures into music

Driving home from a jazz gig one night, I was listening to BBC Radio 3. The program was about the music of Arnold Schoenberg, and the presenter was describing Schoenberg's use of twelve tone technique in his musical composition. This involved writing down the twelve notes (in any order, but with no note repeated), and then performing mathematical operations on them to generate more and more rows of notes. The resulting music therefore was partly inspired (by Schoenberg's choice of notes and 'operations'), and partly generated.

At that time my day job involved working with digital images, and I had written software to manipulate them. I wondered how it would sound if the software were adapted to actually 'play' the images as sounds. Just like Schoenberg's 12-tone technique, I could choose the initial group of notes, but then use the digital images to provide the 'mathematical operations' on those notes to create music from the images...turning pictures into sound.

On a computer, a picture is made up of pixels, and each pixel is made up of a combination of three colors: red, green and blue. To make all the different colors on the screen, the level of red, green & blue is varied in each pixel. An orange pixel will have high levels of red and green, but a low level of blue.

There are a huge number of ways that you could turn these red, green and blue levels into notes. But I decided to go for the most direct and obvious way so that the resulting 'music' is a very direct representation of the pixels. The color levels are simply turned into musical notes: the brighter the color, the higher the note. Because we have 3 colors (red, green and blue), the music has "three tone polyphony" - i.e. 3 notes are always playing together.

Even a small photograph contains a huge number of pixels. For example, if we generated one note for every pixel in a 640 x 480 pixel photo, it would make 307,200 notes! To reduce the number, we can first divide the picture up into a smaller number of rows and columns and find the average pixel value in the resulting blocks - as shown below. Each block is then played in turn (left to right, top to bottom).

pool pool averaged

Photo divided into 12 rows by 16 columns and played, row by row, with a C major scale!

Play MIDI sample

On these pages there are more examples of 'music' created with p2s, and full details of how it works. You can download p2s and/or the original source code and play with it, and perhaps get involved in a philosophical discussion - what is music? :-) And you can see what a well known jazz tune looks like when turned back into a picture!