The ARSS

Main Page | Download | Code | Examples | Documentation | MFAQ | Roadmap


Examples

Here are some examples of what you can obtain with the ARSS 0.2.3. The quality of the results presented here does not reflect the ever-improving quality of future releases of the program.

The two following examples demonstrate the ARSS's capability to reproduce a sound from its spectrogram. Here, the first sound icon is a link to the original sound re-encoded in MP3, the image in the middle is a link to the full image obtained by analysis of the first sound, re-encoded in PNG and possibly slightly edited for the sake of visibility, and the last icon represents the sound obtained by synthesis of the sole aforementioned image.

Units :
- bpo : Bands per octave. That's the frequency resolution. For example, 24 bpo means there is vertically 24 pixels for each octave, which implies that the distance between two pixels is half a semi-tone.
- pps : Pixels per second. That's the time resolution. For example, 150 pps means there is horizontally 150 pixels per second, which implies that the distance between two pixels is 1/150th of a second.

Caption Original sound Produced spectrogram Resynthesised sound

Johann Strauss II's The Blue Danube

38 second classical music extract

Thanks to the brightness correction which brings the sensitivity floor of spectrograms from -48 dB to -96 dB, all of the instruments' harmonics are reproduced intact. The relatively high frequency resolution also plays an important part in the quality of the resynthesis.

Parameters :
- Base frequency : 27.5 Hz (A0)
- Ceiling frequency : 19,912 Hz (D#10)
- Frequency resolution : 48 bpo
- Time resolution : 100 pps
- Synthesis mode : sine
- Brightness correction : 2 (square root function)

HAL 9000

"I'm sorry Dave, I'm afraid I can't do that"

Parameters :
- Base frequency : 55 Hz (A1)
- Ceiling frequency : about 7,400 Hz
- Frequency resolution : 24 bpo
- Time resolution : 150 pps
- Synthesis mode : noise


The following examples show what kinds of sounds one can obtain by creating spectrograms.

Caption Original spectrogram Synthesised sound

HAL 9000 hand-drawn in Photoshop

This spectrogram has been created in about 15 minutes in Photoshop with the brush tool by following the lines and imitating the other features of the HAL 9000 spectrogram presented previously.

We can understand quite distinctively what the voice says, which is almost surprising, considered how quickly and carelessly this has been executed. This leads me to think that one could easily learn how to draw every phoneme, and thus create a clear speech from scratch.

Parameters :
- Base frequency : 220 Hz (A3)
- Ceiling frequency : about 7,900 Hz
- Frequency resolution : 24 bpo
- Time resolution : 150 pps
- Synthesis mode : noise

Roboty tune made from DNA gel

Few real world pictures fed to the ARSS come out as interesting sounds, and this photograph of DNA gel (originally taken from this page) is one of them.

It is thanks to its short horizontal lines, well stacked together vertically, the whole on a black background, that this picture turns into a series of short and distinct notes making up a strangely catchy robotic-sounding melody.

Parameters :
- Base frequency : 27.5 Hz (A0)
- Ceiling frequency : 19,912 Hz (D#10)
- Frequency resolution : 24 bpo
- Time resolution : 150 pps
- Synthesis mode : noise

The following effects have been obtained simply by resynthesis of the original sound's intact spectrogram merely by using different parameters for synthesis.

Caption Original sound Produced spectrogram Resynthesised sound

Time stretching : slowing down

Scatman John's scat slowed down 5 times

This effect is simply achieved by changing the time resolution setting for resynthesis. The frequency resolution has been turned to the lowest decent setting to obtain the best time resolution possible, which is absolutely crucial when slowing a sound down. Note how different and more natural the result sounds from the same effect as achieved by Adobe Audition 1.5

Parameters :
- Base frequency : 27.5 Hz (A0)
- Ceiling frequency : about 20,000 Hz
- Frequency resolution : 6 bpo
- Time resolution : 300 pps => 60 pps
- Synthesis mode : noise

Time stretching : speeding up

President Bush's 2008 State of the Union Address sped up 100 times

Unlike most other time stretching algorithms which, in order to speed a sound up a hundred times would simply cut a sound in tiny chunks and keep one chunk out of every 100, in a way similar to how image editing programs can reduce a picture's size using nearest neighbor interpolation, the ARSS properly filters information into keeping everything that still could be heard at such speeds. In this example we can make out two main components : the bubbly sound which is the president's speech, and the short noises which are the audience applauding.

Parameters :
- Base frequency : 27.5 Hz (A0)
- Ceiling frequency : about 9,000 Hz
- Frequency resolution : 60 bpo
- Time resolution : 2 pps => 200 pps
- Synthesis mode : noise

Interval stretching

Samuel Barber's Adagio for Strings stretched out by a factor of 2

This effect is I believe completely new (however if you think you've heard of such a thing before I'd be delighted to hear about it). For that reason it's also a bit difficult to explain, so please bear with me. While pitch shifting moves notes around but leaves intervals between notes intact, this technique compresses or stretches out intervals between notes in a proportional manner all over the spectrum. This is equivalent to taking a score, and moving all notes apart from each other by a fixed amount of semitones.

So for example if you stretched out the notes C3-D3-G3 by a factor of 2 using that you might obtain the notes C3-E3-D4, or depending on other settings you might as well obtain A5-C#5-B6. The important point is that the interval between two notes is doubled, and in our precise example, we stretch our sound from 4.77 octaves to 9.53 octaves. While I chose here to double intervals for harmonic reasons, you can also chose to reduce them. It usually turns anything into eerie-sounding dissonant music.

Parameters :
- Base frequency : 110 Hz (A2) => 27.5 Hz (A0)
- Ceiling frequency : about 3,000 Hz => 20,378 Hz
- Frequency resolution : 120 bpo => 60 bpo
- Time resolution : 50 pps
- Synthesis mode : noise
- Brightness correction : 2 (square root function)

The following example shows how an image editing program can be used to achieve things previously impossible in sound processing.

Caption Original sound Original spectrograms Edited spectrograms Synthesised sounds

Instruments and vocals separation

A 1970s funk loop (George Duke's Reach For It) broken down into layers

The separation of each instrument was achieved mainly by paintbrushing (in Photoshop) in black around the features of interest. Once again one of the main obstacles was the resolution (which rendered the use of two different spectrograms of the same sound at different resolutions necessary), but also how instruments were mixed together even in the image. The drums were the biggest source of problems as their noisy features spread all over the spectrogram, thus mixing up with everything else. However thanks to the power of image editing techniques we can achieve a quality of separation that traditional sound processing techniques cannot come close to.

Parameters :
- Base frequency : 27.5 Hz (A0)
- Ceiling frequency : about 20,000 Hz
- Frequency resolution : 6 and 24 bpo
- Time resolution : 150 pps
- Synthesis mode : noise for the drums, sine for everything else
- Brightness correction : 2 (square root function)











The following demonstrates how can images be transmitted over sound nearly losslessly under ideal conditions.

Caption Original image Transmitted sound Transmission result

Basic black and white image transmission

Lena transmitted over MP3

Thanks to linear frequency scaling (--linear) and sine synthesis we can now use the ARSS to transmit or store images as sound with hardly any loss in quality, provided that we do this under ideal conditions.

There are a few things to note about this very example. Because the final image is produced from the actual MP3, as opposed to a lossless reproduction of the synthesised sound, and because such a sound contains much more information than regular music, the MP3 is encoded using a bitrate 4 times larger than the usual one used for music. If we had used a lower bitrate, the image would have been very noisy, and with an even lower bitrate entire chunks of the image would have been blacked out. This is due to the fact that this type of sound contains a lot more information than the MP3 format was designed for.

It may also be of interest to note that this method of image transmission is actually as efficient as the method used for analog black and white television transmission, which means that we could theoretically transmit TV programs using this method within the same bandwidth as analog television, and with the same quality, under ideal conditions. One of the interesting aspects of this technique is that the images transmitted like this can be picked up and viewed by anyone with a spectrograph, and given the arguable universality of mathematics and time-frequency analysis, one may go as far as arguing that it would be a good way of transmitting images to eventual extraterrestrial civilisations, as we may expect them to be acquainted with such analysis techniques and to use them at some point when analysing strange unusual signals from outer space.

Back on Earth, you could try the following. Ask someone you know to give you a phone call and to play this sound. Record it, analyse it (with the following parameters : 300 Hz to 3400 Hz, height 256 pixels, 10 pps, linear) and if you see anything you recognize please send it to me (I don't have a telephone and I'd like to know how well it works).

Parameters :
- Base frequency : 20 Hz
- Ceiling frequency : 18,000 Hz
- Frequency resolution : about 35 Hz/line
- Time resolution : 32 pps
- Synthesis mode : sine


This site is in hiatus. Last updated on February 23rd, 2009
©2007-2009 Michel Rouzic