14 November 2009

Interval expansion

The video shown above is an example of interval expansion, an effect that can only be achieved through spectrogram resynthesis. It consists in expanding the spacing in pitch between notes so that each interval doubles. Therefore, two notes one semitone apart become two semitones apart, which creates a melody that sounds different from the original melody. This is done here by setting analysis parameters so that only about 5 octaves of the original sound are analysed, then the frequency settings are changed so that these 5 octaves are stretched in pitch across 10 octaves.

Calculating the frequencies

There are two factors that need deciding, the expansion ratio we want, here 2, and which centre frequency we want, that is, which frequency will stay the same throughout the expansion. Here I chose A4 (440 Hz). Now, we want to cover the entire audible range for the output, so we want the result to range from 20 Hz to 20,000 Hz.

So with all these parameters, we now need to calculate the minimum and maximum frequencies we want to use for analysis, that's what we'll put in config.txt prior to opening the sound. First, the maximum frequency. Because we will expand the sound in pitch by a factor of two, we need half the interval (in octaves) between the centre frequency of 440 Hz and the maximum frequency for analysis than we'll have between the cetnre frequency and the maximum frequency of the synthesised sound, which is 20,000 Hz. We'll use the following formula :

Fmax' = Fcentre * (Fmax / Fcentre)^(1 / ratio) which gives us here
Fmax' = 440 * (20,000 / 440)^(1 / 2) which you can type in Google to obtain the answer which is
Fmax' = 2966.479 Hz

Same formula for the minimum frequency by replacing 20,000 in the formula with 20, which gives you 93.808 Hz.

Once you've entered those two values in the file config.txt for 'min_frequency' and 'max_frequency' you can load Photosounder, load your sound, set the Min. and Max. Frequency knobs to 20 and 20,000 to obtain the desired result.

4 November 2009

New instrument isolation techniques

During the last few weeks, Photosounder received the additions of new functions that allow for instrument isolation to be done all in Photosounder, simply and quickly, without the use of any external program.

In this video we see how to extract a continuous synth line graphically using those new tools. This synthesiser is harmonic, which means it is composed of vertically stacked parallel lines, separated by constant vertical distances. It is crucial to be able to identify the position of the base frequency, which is the lowest of those lines. It might not always be easily seen, and sometimes it's entirely absent. To help you find that base frequency, you can, using any brush tool and the harmonics modifier (the button with four vertically stacked dots), hover over the image to see an overlay of the first few harmonics and with your mouse cursor try to match the cross hair overlay with the lines on the screen. This is fairly straightforward, however things can sometimes be a bit confusing. It is best to try what seems like the lowest possibility first in order to avoid confusing the first harmonic (the second line from the bottom) with the base frequency (the first line), but in the case of a chord, it is best to try to erase the higher notes first.

In this example there are no chords, and the base frequency is easily seen. The synth line is also quite strong compared to the rest of the instruments, meaning we can safely use the magnet modifier so that the cursor will effortlessly snap to the synth's curves.

Using the smart erase tool (represented by a road roller icon), the harmonics modifier that reproduces the smart erase tool's action on every harmonic, and the magnet modifier to snap to the curves we can now erase the synth by spraying over it from left to right. The 'Tool intensity' should be set to 100%, the 'Spray width" anywhere between 10 and 20 pixels. Also you'll most likely want to hold the H key during the erasure. This slows down the mouse cursor 32 times by default as to give you more precision.

This gives you an image and sound practically devoid of the removed instrument. We want the opposite, which we obtain by pressing the Mask Invert button. Make sure the lossless mode is turned on for best results.

At this point the result might not be fully satisfactory, but this is most of the job done. Further work can be done to clean up/fix the image further, included using external programs such as Adobe Photoshop or GIMP.

Horns, such as in this video, offer a different kind of challenge. Identifying the base frequency can be more tricky, one reason being that the base frequency can be pretty low in pitch, giving it a lower graphical resolution to work with. Also the lines that make up a horn note are less smooth and regular, however this in turn is an advantage, it makes the result more forgiving to irregularities.

Because of these characteristics, it is recommended to change a few parameters in the file config.txt. In this video the min_bpo setting was changed to 12 instead of 24 to have a better time resolution in the area of the base frequencies of the horns. The pixels_per_second parameter was lowered from 100 to 50 because overall not so much time resolution is needed here. More importantly, the bands_per_octave parameter which defines the vertical resolution was increased from 60 to 180, this is because the harmonics of the horns reach quite high, and as harmonics go up they get closer to each other. With a bands_per_octave setting of 60, after the 30th harmonic or so all harmonics are merged together. Increasing that setting allows them to remain separated and hence more readily separable. For the same reason, the spray_harmonics setting which defines how many harmonics the harmonics modifier works on was increased from a default of 20 to 100.

The instruments were removed in the same way as in the previous video, with a few exceptions. Firstly, it's a bit harder to identify the base frequencies of each note in this sample, but it was also harder to identify what belong to which instrument, so it took a bit of trial-and-error. Also, the magnet modifier was turned off, for two reasons: because the instrument being removed was less strong the magnet modifier was less efficient, and because most notes are straight lines it's just as easy to follow without the magnet modified. The H key was still held down for precision.

After the erasing the desired instruments and doing the Mask Invert, we can notice a couple of issues with the image and the sound. We can see and hear remains of the hi hats which were caught up in the higher harmonics, and we also notice that the chords are much louder than the parts with single notes. The first issue can be simply be solved using the dark spray tool, without any modifier turned on, and with a decreased Tool intensity. This editing is best done by temporarily turning up the Gamma as to see better what's being done.

The second issue comes from the fact that when we remove chords we make as many passes as there are notes, and because there is much overlap in the harmonics it is equivalent to removing the same thing many times, which results in louder chords than should be. This can be solved using the rectangle tool. Using it, by dragging an area on the screen by pressing the right mouse click (which lightens the area by the ratio defined by Tool intensity, the left mouse click in turn darkens the area by the same ratio), you can make the passages devoid of chords brighter and in turn louder.

As said earlier, this type of instrument is more forgiving than the smooth flat synth line in the first example, and it takes less work to obtain a satisfactory result. Again however, the results can be further improved.

Labels: , , ,

13 September 2009

Basic Vocoding with Photoshop

Photosounder shares a lot of principles with vocoders. Vocoders, as used in music to make robotic voices, work by slicing the input voice (the modulator) and the input tone (the carrier signal) into narrow bands of frequency, detecting the envelopes for the modulator's bands, and modulating the bands of the carrier with these envelopes. This makes the vocoded signal inherit the tone of the carrier signal and a number of characteristics of the modulator, which translates into intelligible speech that sounds like anything but a human voice.

If you're familiar with how Photosounder works, you can probably draw the parallels. If not, this is how it works. Both Photosounder and vocoders cut the input signals into narrow bands of frequency, and Photosounder detects their envelope to form an image that represents the sound. In lossless mode, Photosounder also keeps the filtered bands somewhere in memory to modulate them with any eventual image input. Therefore, vocoding can be done by multiplying the image of the modulator with the image of the carrier signal, and be used in lossless mode with the original carrier signal as the reference sound, so that the modifications done to the carrier image (which is, the multiplication by the modulator image) can be applied to the carrier signal.

However, traditional vocoders use a much lower resolution in the frequency domain, whereas Photosounder uses by default 571 frequency bands, vocoders use typically between 8 and 32, over the same range of frequencies. This means that whereas in Photosounder you can clearly distinguish each harmonic that makes up human speech, to a vocoder these are all fused together. And that's actually what we want, because we don't want to keep any information about the input voice's vocal chords, we want to replace the vocal chords with the carrier signal and apply to it the same treatment as the raw sound from the vocal chords received, which was turned from a meaningless "aaaaaaaaah" to intelligible speech.

This is solved by applying a vertical motion blur to the modulator's image in Photoshop. In the video I used a vertical motion blur of 20 pixels three times. Also, since frequency resolution here is not important whereas time resolution is, it is advised that you edit config.txt in the Photosounder folder and change the value of min_bpo to 0 instead of 24.

This of course is only basic vocoding. One could just stretch the modulator image around in all directions and in all sorts of ways prior to overlaying it with the carrier image. It will be the subject of my next blog entry, if I can find any example sounds that suit me.

Labels: , , ,

22 July 2009

Graphical sound denoising challenge results

And the winner is Iain Fergusson with the following entry made with GIMP:

Extract of the original song:

Iain's result:

Iain obtained this excellent result and won a free copy of Photosounder worth €99 by following these steps :

  • Find selection of sound which is just noise, copy and paste to new layer

  • Pixelate it with pixel height 1, and width as wide as the layer is

  • Resize noise selection layer to fit entire image

  • Set noise selection layer to 'subtract' - adjust curves if you need more subtraction

  • 'Copy visible'

  • Paste this into a mask on the original image

  • Create black layer below the original

  • Adjust curves on original layer mask to push light parts to white, and carefully, push the very darkest parts to black

  • Save image

You can download the full denoised song here.

Labels: , , , ,

2 July 2009

Graphical sound denoising challenge

Removing noise from recorded sound has always been a difficult problem, requiring the use of specific electronic circuits or dedicated computer algorithms. With the recent advent of image-based processing of sound it is now possible to tackle this problem from a different angle with such simple and ubiquitous tools as image editing programs. This is the object of this challenge, denoising sound using graphical techniques.

The sound chosen for this challenge is a 1894 recording of Daisy Bell by Edward M. Favor. Dating from the early days of sound recording, it's suffers from heavy noise and artifacts. The goal of this challenge is to remove these undesirable features in a graphical way while preserving the vocal and musical elements in order to enhance the sound quality of this recording.

An extract from the recording

This is an example of the original extract being denoised graphically. It was done in Photoshop in a few minutes using some very simple operations.

Prizes :
The prizes are two full licenses of Photosounder each worth €99, one for each of the following category of entries :

  • The image-editor category : For entries done entirely with an image-editor and reproducible by any user of such a program.

  • The algorithmic category : For any other entry, but more particularly for entries involving the use of custom-written image-processing code or any process beyond the usage of publicly available user-level tools.

All entries must be sent by e-mail to challenge@photosounder.com before July 16th at noon GMT. The entries will then be reviewed by a panel of listeners and the results will be announced a few days later.

What you'll need :

Rules :
  • A valid entry must consist in the resulting image of dimensions 15,379 x 571, preferably in PNG format, as well as an account of how the image was obtained detailed in a way that would allow the reproduction of the process, and sent by e-mail to challenge@photosounder.com before the deadline.

  • Your denoising method must be practical to use on long sounds and be reproducible in a few minutes of work on a sound of several minutes. Therefore this excludes the recourse to such techniques as paintbrushing parts of the image out.

Tips :
  • To hear your results the way they should be heard make sure to use Photosounder's lossless mode. To do so first load the original sound in Photosounder then load the modified image corresponding to that sound and activate the lossless mode.

  • If in your sound you hear artifacts similar to bubbling it may be that in your modified image so pixels are much brighter than they originally were. To make sure it doesn't happen you can overlay your modified image with a copy of the original image and set the blending mode of the original image to 'Darken' so that it prevents any single pixel from being brighter than it originally was.

Labels: , , , ,

15 May 2009

Motion blur sound reverberation

Out of the many possible approaches to processing sounds using Photosounder, there is one particular approach that had yet to be tested, I'm talking about additive effects. These additive effects consist in processing a sound to then mix the result with the original sound. The next series of examples will demonstrate how to graphically create such an effect to achieve something somewhat similar to sound reverberation.

One way to do that is to apply an horizontal blur to a sound's image, then shift it to the right so that the blurred sound doesn't play ahead of the sounds in the original sound. It is typically done effortlessly in a very few minutes. Here is how I operated for the following examples :
-Open a sound in Photosounder
-Save the image
-Open the image in your favourite image editor (Photoshop, GIMP, etc..)
-Duplicate the image's first layer (which we'll keep as a reference)
-Apply an horizontal blur to the new layer (Filter > Blur > Motion Blur... in Photoshop) of about 30 pixels
-Apply the same blur again 2 or 3 times
-Set the layer's blending mode to Lighten so that you can see through it, and move it to the right so that notes in our blurry layer don't start before they hit on the original first layer
-Duplicate that layer
-Blur the copy some more
-Shift it to the right some more so that it doesn't start before the original notes hit
-Optionally adjust the luminosity of that layer so that it can be more intense
-Hide the original layer so that we only see the two blurry layers merged together
-Save the image and open it in Photosounder
-Save the sound it produces to a file
-Open the new "blurry" sound and the original sound in an audio editing program and mix the two sounds together (no timing offset is required)

One of the interesting aspects of following these steps is that they involve doing the very exact same thing for every sound, meaning that one could just record an action in Photoshop and reproduce the whole process at the press of a key. Of course this is just one way to do it, the goal here being to make the shorter blur start right after the notes originally hit, and the longer blur to hold the notes so they can last much longer and slowly fade.

A few examples

Stefon Harris' Until in its original form followed by its "blurred" form.

Same thing with this Rhodes piano piece.

Since the part that is added to the original sound is entirely contained in an image, we have the freedom to get creative and do practically anything we want to do with it. Here as an example the image was shifted up by 60 pixels so that the reverberated sound is one octave higher, thus achieving a different effect.

This of course works not just one musical instruments but on all types of sounds, including speech. Here as an added twist, the blurry image was synthesised twice and the two sound files were put together as one stereo sound, giving the resulting sound great stereophonic qualities.

Labels: , ,

12 May 2009

Time pixelation on sound

For this series of examples I chose to experiment with what I call time pixelation of sound. It consists in taking the image of a sound, pixelating it horizontally only, then turning it back into a sound.

It is simply achieved by doing the following :
-Open a sound in Photosounder
-Save the image
-Open the image in your favourite image editor (Photoshop, GIMP, MS Paint, etc..)
-Squeeze your image horizontally to a given width, make sure to choose the Nearest Neighbour method
-Stretch your image horizontally back to its original width, make sure to use the Nearest Neighbour method again so the result looks "blocky"
-Open the modified image in Photosounder

(All of the above can be reproduced using the Photosounder Demo)

This has the effect of taking a short bit of the initial sound at regular intervals, and stretching it in time. For varying results, experiment with the width you squeeze the image to, but also with the horizontal offset in the original image. Moving the image a few pixels to the left or right results in the resizing method to pick different columns of pixels.

A few examples

This is 2001: A Space Odyssey's HAL 9000 speaking with only 10 pixels a second.

Once pixelated to about 3 pixels a second and looped back and forth, speech can turn out to adopt some almost catchy musical qualities.

A strings sample (above) reduced to 16 pixels (below) :

The arpeggio at the beginning of The Animals' House of the Rising Sun, first reduced to 6 pixels, then 12 pixels, 24, 48, 96 and finally the original full 1200 pixel image. It is interesting to note how this process can "de-arpeggiate" an arpeggio by selecting only notes at regular intervals.

Labels: , ,

©2008-2009 Michel Rouzic