13 September 2009

Basic Vocoding with Photoshop




Photosounder shares a lot of principles with vocoders. Vocoders, as used in music to make robotic voices, work by slicing the input voice (the modulator) and the input tone (the carrier signal) into narrow bands of frequency, detecting the envelopes for the modulator's bands, and modulating the bands of the carrier with these envelopes. This makes the vocoded signal inherit the tone of the carrier signal and a number of characteristics of the modulator, which translates into intelligible speech that sounds like anything but a human voice.

If you're familiar with how Photosounder works, you can probably draw the parallels. If not, this is how it works. Both Photosounder and vocoders cut the input signals into narrow bands of frequency, and Photosounder detects their envelope to form an image that represents the sound. In lossless mode, Photosounder also keeps the filtered bands somewhere in memory to modulate them with any eventual image input. Therefore, vocoding can be done by multiplying the image of the modulator with the image of the carrier signal, and be used in lossless mode with the original carrier signal as the reference sound, so that the modifications done to the carrier image (which is, the multiplication by the modulator image) can be applied to the carrier signal.

However, traditional vocoders use a much lower resolution in the frequency domain, whereas Photosounder uses by default 571 frequency bands, vocoders use typically between 8 and 32, over the same range of frequencies. This means that whereas in Photosounder you can clearly distinguish each harmonic that makes up human speech, to a vocoder these are all fused together. And that's actually what we want, because we don't want to keep any information about the input voice's vocal chords, we want to replace the vocal chords with the carrier signal and apply to it the same treatment as the raw sound from the vocal chords received, which was turned from a meaningless "aaaaaaaaah" to intelligible speech.

This is solved by applying a vertical motion blur to the modulator's image in Photoshop. In the video I used a vertical motion blur of 20 pixels three times. Also, since frequency resolution here is not important whereas time resolution is, it is advised that you edit config.txt in the Photosounder folder and change the value of min_bpo to 0 instead of 24.

This of course is only basic vocoding. One could just stretch the modulator image around in all directions and in all sorts of ways prior to overlaying it with the carrier image. It will be the subject of my next blog entry, if I can find any example sounds that suit me.

Labels: , , ,

©2008-2009 Michel Rouzic