THE OTHER WORKS AND CONCERNS OF

MICHEL ROUZIC

I'm Michel Rouzic, the sole person behind Photosounder, SplineEQ and Spiral. The name of my first program became the name of my "company" because I couldn't think of a company name plus I was already using this domain name. I hate naming things, I tried to come up with a company name by starting with 'Michel Rouzic Software', taking the first half of each word then putting it all together but that gave me Microusoft, and that's just one letter away from Microsoft. And Microsoft? What kind of a name is that for a company!? It might as well be called Small & Flaccid! Think about it!!

This page is all about some of the other things I've done either besides or as part of my 3 main projects.

rouziclib

Rouziclib is my personal open source library of code that is shared between my different projects. I made it when it became impossible to maintain several copies of the same code in several projects. Now when I update one function in rouziclib all my projects benefit from it.

Rouziclib contains a lot of the algorithms described below, and a lot of the graphics and mathematics code used in SplineEQ and Spiral, such as code for drawing antialiased lines, circles, frames and gradients, the perceptual colour space used by Spiral or the fast approximations of mathematical functions that make Spiral faster than it would otherwise be.

If you're a developer with code of your own that you reuse across projects I believe that like me you should make a library out of it and consider sharing the sources for the benefit of all.

A live hourly satellite wallpaper of the whole Earth


If you forgot today was eclipse day, the wallpaper reminds you.
What's your desktop wallpaper like? Let me guess, it's a static picture. This is 2024, you can do better than that, step your wallpaper game up! Full pictures of the Earth taken by geostationary satellites are made available on an hourly basis. All you need is to know where (here and here), write a script to fetch the latest image, put it together using ImageMagick or your own code (I had to make my own program to erase the white border overlay) and make it run every hour!

I obviously enjoy a natural colour view best, which incidentally helps me keep tabs on all the weather in the front hemisphere of the Earth. Why look out the window when you can know what's the weather like by looking at your desktop?


Combining different infrared layers is fine too.
However at night the natural colour image doesn't get updated anymore, so then I use the airmass view instead. When I started creating my satellite wallpaper script 14 years ago I would combine 3 black and white infrared pictures to make one colourful view of the atmosphere, but it was only updated every 6 hours and you couldn't discern much.

The many scripts and code I currently use for this are gathered in this file, sadly at the moment it's a bit difficult to setup, plus the resolution of my desktop is hardcoded, so if you want to set it up for yourself you'll have some work to do. The whole thing works as a Windows task that calls a vbs script that calls a batch file that calls wget, ImageMagick's convert and a program of my own. But despair not, for I intend to one day maybe make a simple program that does it all on its own. Do not e-mail me if you can't set it up, however do e-mail me to express your interest in that eventual program. If you'd like to make that program yourself be my guest, you can even reuse my border-erasing code.

Sponge Blob Tennis


The animation.

The game.
In May 2007 I saw a GIF of two sponge-shaped gelatinous blobs playing tennis together and I thought that would make a good game. At the same time the GBAX 2007 game-making competition had just been announced, and with almost two months to the deadline I decided to try to make a GP2X game out of it, with a little additional help from the GIF's original creator. The sole judge for that competition didn't care much for pong games though, and my game was and still is a bit rough around the edges.

It has realistic physics (realistic gravity, bounce and air friction), the players are not sprites, they're physical blobs that change shape depending on what happens to them, the graphics and fixed-point physics are so well optimised that I could still get 50 FPS by downclocking the GP2X from 200 MHz to 80 MHz, and it features a pretty good AI too. A year later I also added sound effects to the game. I drew those sound effects in pixel art to fit the general theme of the game and synthesised them using ARSS, Photosounder's command-line precursor.



A pixel art game needs pixel art sounds too.
Okay maybe I should explain the game. It's a blend of a tennis game and a pong game. You have a tennis court, it's all in isometric 3D pixel art. On that court you have two players, the "sponge blobs", they can only move side to side and jump. You can aim the ball left or right by hitting the ball off-centre with your blob, and since your shape changes as you blob around it also affects how you return the ball. If your blob is laying low and flat you'll hit the ball with the top end of your gelatinous face which will throw the ball hard, fast and straight. If you're standing tall and straight then the ball will hit you lower and you'll throw a high and slower ball. Finally if you jump up while you're hitting the ball you'll throw the ball so high that the opponent will have to jump in order to catch it lest they lose the point. But if they return the ball far from where you're busy landing you'll have trouble catching it yourself, so this is very much a double-edged sword.

By default you would be the player on the right side, however you can toggle AI control for any of the players at any time, so you can change sides, play with a friend (it was achieved by having each player hold a side of the GP2X console, a bit awkward but it worked) or just let the game play itself. You can try the Windows version of Sponge Blob Tennis, the source code is included, and now you can even try it in your browser! (use the keys 1 2 3 on the numpad, the other keys are U I to toggle AI control and X C V for the left player). And here's a random review.

sRGB - Or how I resent you for not understanding it

I want to tell you about my greatest pet peeve, something that annoys me perhaps more than it should. It's something that you, personally, most likely despite believing the opposite, do not understand, even though you really should. I'm talking about gamma compression and how everybody, yourself included, mistakenly assumes that those familiar pixel colour values that range from 0 to 255 are linear. It's not just you, it's almost everybody else, photographers, designers, scientists, developers, even the guys who made Photoshop, or pretty much any image editor for that matter. That's right, the guys who made Photoshop don't seem to quite understand how pixels work.

Let me ask you this, if you have a grey pixel of RGB value 120, 120, 120, and you add it to another pixel of the same value (or just double the first value instead), what value should you get? 240, 240, 240? Wrong!


29 years of Photoshop layers and no one noticed the left cross looks a tad too bright?
You should get 165, 165, 165. Why? Well first of all the pixel value 240 doesn't have any relation to the value 120, it's not its double, it's not anything! The RGB values you're used to use the sRGB colour space, that means those values are a code for another value, a linear value. Linear values are what you want, they're what represents how much light you get. If you have a linear value of 0.1 (white being 1, black 0) that means you have 5 times less light coming at you than with a value of 0.5. So a sRGB value of 120 actually has a linear value of 0.188, and twice that is obviously 0.376 which when turned back into a sRGB value gives you about 165. In comparison a sRGB value of 240 represents a linear value of 0.871, more than 2.3 times brighter than the correct 165 value.

I care because that error is everywhere. It's in Photoshop (unless you use its 32-bit/channel mode, which last time I checked simply had a lot of functions missing), pretty much every other image editor, and pretty much every other software except 3D renderers (video games included) and some really expensive professional programs. Even colour spaces are based on the completely arbitrary sRGB compression. And the problem isn't just with adding two pixels, it affects antialiasing, blending and transparencies, even resizing or blurring is affected. For instance if you blur or shrink an image of a star field the result will be much darker than it should because when you average a star of let's say value 240 (0.871) with a background of 0 then instead of getting the physically correct average of 176 (0.436, which is 0.871/2) you get the much darker value of 120 (0.188). That's also how you can have pictures with thumbnails that look nothing like the full picture, as shown there.


Someone actually looked at this and thought
"yeah, that looks about right"...
And it's not just software, as a result of that common misconception people get an utterly absurd idea of the relationship between colour values. Look at this completely ridiculous image made by the European Space Agency itself. Don't you think the Moon looks darker than coal (keep in mind it's supposed to be lit by the Sun)? Doesn't that comet look somehow even darker than the darkest material ever? That would be because the person who made this picture does not understand that while albedo is a linear measure of light, sRGB pixel values are not, so they made the images much darker than necessary by effectively turning the correct albedos of 12% for the Moon and 5% for the comet into respectively 1.3% and 0.04%. The whole premise for their article is based on that very misconception. Not even rocket scientists who landed on a comet understand pixel values, what hope is there for the rest of us?

If you're a developer and you do any kind of operations on pixels, please do the following. Use an internal representation for pixels that is linear, and use at least 12 bits per channel (I use a fixed point arithmetic format with 15 fractional bits and dithering as detailed here, so I get no banding on gradients) because you need an extra 4 bits to preserve the precision of the darkest pixels. When loading images convert them to your linear pixel format, and don't convert anything back to sRGB before you save an image or need to show it on screen. Simple lookup tables can do the job quickly both ways. Gamma-compressed sRGB is for storage in files and for displays. Always manipulate linear pixel values otherwise.

I also made a very simple online converter to switch back and forth between linear and sRGB which is quite handy if you need to calculate some values manually.

10-bit square root pixels - An advantageous texture format

Now that you understand the importance of doing all operations on linearised pixels and that hopefully you understand that 8-bit per channel isn't quite enough even with gamma compression, we can consider practical aspects. So far I've used 3 different pixel formats, 8-bit sRGB, 15-bit (or 16-bit depending on how you count) fixed point linear format, and the gold standard of pixel formats which is 32-bit floating point linear format.

It doesn't get any better than the latter, it's linear, a native format (unlike say 16-bit floating point) to both CPUs and GPUs and in every language, high precision and it can handle out of range values which is very useful mostly when processing raw images as they can have negative values which are important to consider when scaling down or denoising. The big problem is that it's 4 times more data than 8-bit sRGB, so you're wasting a lot of time transfering textures in that format, and a lot of memory storing them. You could use 8-bit sRGB, however it's low precision which might show when you make mipmaps of a smooth gradient, and unless you want to use a lookup table (which you probably wouldn't want to on a GPU) then linearising takes a lot of math, including a very time-consuming pow(). We could choose a compromise between the two by using 16-bit fixed point format which is good and simple but still wasteful or 16-bit half floats which complicates the implementation significantly, and both are more wasteful than they need to be.


The square root and sRGB functions similarly offer higher precision in the darks (lower left corner).
Which brings us to a much better option: square root format. First of all 8-bit sRGB is itself quite wasteful, for memory alignment reasons we pack them in 32-bit words, and the 4th byte is either wasted away or used for transparency, which most of the time we don't need. This means we have 8 extra bits we can spread between 3 channels. I choose to give 2 more bits to each channel, plus 2 extra bits for the green channel. According to me in terms of perceptual brightness expressed in linear intensity percentage of white, full red is equivalent to the intensity of 16% white, full green 73% and full blue 11%. This is what I use in Spiral to make sure that every colour comes off as about as bright, and while these aren't perfect weights they're pretty close to the mark, give or take a few percentage points. These weights mean that green appears more than 4 times brighter than either red or blue, and therefore giving it 2 extra bits is well warranted. So R10-G12-B10 is a very good choice.

But using more bits isn't the key. So far we only had a choice between sRGB and linear format, the former being advantageous by giving more precision in the darks but slow to convert and the latter requiring 4 extra bits per channel to obtain the same precision in the darks. I choose instead the best compromise, storing the square root of the linear value, in integer format of course. The reason why it's so advantageous is that while it's close to sRGB in terms of giving more precision in the darks, it couldn't be simpler or faster to linearise, all you have to do is multiply it by itself!

Let's say we start off with a linear value of 0.09, we take its square root 0.3, multiply it by 1023 (all this can be done very quickly when loading an 8-bit sRGB image using a sRGB to square root format lookup table) and store it as a 10-bit integer, 307. Then in order to use it we need to linearise it by multiplying it by itself, which gives us an integer value of 94,249 out of a maximum of 1023², and then we can either do what we need in integer form and then put the result back in square root format using an integer sqrt function like this one or even just sqrt() which could be faster, or convert it to float and multiply it by a float constant equal to 1023⁻² to bring it in the [0 , 1] range. Let's say for example that I want to average 4 pixel values into one to generate mipmaps, this must be done linearly but must be converted back to square root format, I can simply just do this: isqrt( p0*p0 + p1*p1 + p2*p2 + p3*p3 >> 2 );

In conclusion this format is:

It is mainly intended to be used as an internal replacement for 8-bit sRGB so that you can load a picture, convert it to 10-bit square root format and then forget all about sRGB, then tile and mipmap it and send the tiles to the GPU as needed for displaying where every pixel will upon loading be linearised into a float vector.

Polynomial lookup table-based approximations of mathematical functions

In approximating mathematical functions on computers there are three main factors to balance: speed, precision and memory usage. Approximations using simple lookup tables (LUTs) that contain precalculated values for the function to approximate are typically quite fast but not very precise and take up a lot of memory. They also become slower as the table gets larger due to the inability to contain the whole table at the lowest cache levels. Precision can be improved by linearly interpolating two neighbouring LUT entries at the cost of speed. Another approach is to approximate a function using polynomials. Polynomial approximations have the advantage of requiring very little memory, however they can only improve in precision by increasing the polynomial order which both slows down the computation tremendously and limits the maximum precision since an increase in operations for the computation of a single value increases the rounding error with every operation.

There is however a very little-known (I haven't found anything about it anywhere, if you have let me know) way to combine the advantages of lookup tables and polynomial approximations. It is simply to have a lookup table that contains the coefficients for a quadratic polynomial or higher (a sextic polynomial is fine too), a polynomial which would accurately approximate only a small subrange of the whole range to approximate, with all the elements of the LUT together effectively approximating the whole desired range. The advantages are many, a quadratic polynomial can be computed quickly, unlike with linearly interpolated LUTs there is no interpolation between two values to perform, precision increases by up to a factor of 8 for every doubling of the LUT's size as opposed to 4 for interpolated LUTs and 2 for plain LUTs, high precision can be achieved with small tables. I typically use tables that represent only 128 segments and therefore take up a very few kilobytes, this allows the LUTs for several functions to fit inside in a CPU core's 16 to 32 kB Level 1 cache.


The cosine and its quadratic approximation over the blue range [0.375 , 0.4375]
with c2 = 16.35989, c1 = -16.76637 and c0 = 3.27985.

The error curve between the approximated function
and its approximation over the blue range.

The result is found by obtaining from the LUT the three quadratic coefficients c0, c1 and c2 so that f(x) = (c2x + c1) x + c0. Those coefficients should be obtained from the LUT at an index typically derived from either the most significant bits of x if x is an integer or fixed-point arithmetic number, or if x is a floating point number from the most significant bits of the mantissa or even a combination of the lower bits of the exponent and the upper bits of the mantissa, depending on the range covered by the LUT. The top of the mantissa is of course best suited for ranges such as x = [0.5 , 1.0[ or [1.0 , 2.0[, but simply adding 1 to x at the start of the function can make it suitable for a x = [0.0 , 1.0[ range.

A small detail remains which is how to compute the coefficients in the LUT in the first place, or in other words how to get a quadratic fit for a segment of the function to approximate. I choose not to minimise the averaged squared error but rather to minimise the maximum error, since the maximum error is what matters to me most when evaluating the precision of an approximation. I do this by first calculating c2 using the difference of the point in the middle of the subrange (delimited by start and end) between its f(x) value and the average of f(start) and f(end) then working from f(x) - c2x² to find c1 and c0, like so:

    void find_quadratic_fit(double (*f)(double), double start, double end, double *c0p, double *c1p, double *c2p)
    {
            double c0, c1, c2, middle, p1, p2, height2, width2;
    
            c1 = (f(end) - f(start)) / (end - start);
    
            middle = (end+start) * 0.5;
            height2 = (f(end)-c1*end) - (f(middle)-c1*middle);
            width2 = end - middle;
            c2 = height2 / (width2*width2);
    
            p1 = 0.75*(end-start) + start;
            p2 = 0.25*(end-start) + start;
            c1 = ((f(p1)-c2*p1*p1) - (f(start)-c2*start*start)) / (p1-start);
            c1 += ((f(end)-c2*end*end) - (f(p2)-c2*p2*p2)) / (end-p2);
            c1 *= 0.5;
    
            c0 = ((f(end)-((c2*end + c1)*end)) + (f(start)-((c2*start + c1)*start))) * 0.5;
    
            *c0p = c0;
            *c1p = c1;
            *c2p = c2;
    }

With that approach the error curve between the approximated function and the approximation should be S shaped with peaks of equal absolute height at the start, the end, one quarter and three quarters of the segment.

You can find a few examples of that approach being used in rouziclib with fixed-point arithmetic examples for atan2, cos, division (the reciprocal function) and floating point examples for log2, exp2, sqrt and a high precision quintic cos.

Help needed! - Symbolic convolution of a triangle and a Gaussian function

Mathematicians listen, I have a problem I can't solve on my own. If you can significantly help me solve it I'll give you a license of whatever you want. The problem is how to find the symbolic result for the convolution of triangles defined by the multiplication of rotated Heaviside step functions with a Gaussian function.

The problem is outlined in this StackExchange post.


The symbolic solution for a square. I want to do the same with triangles now.

Why do I need an explicit solution or at least an accurate approximation for this? For the antialiased rasterisation of polygons of course! So far I know how to make rectangles and circles that way, using this I can directly draw lines, squares, circles or any combination of these with perfect Gaussian antialiasing using for each pixel a relatively simple formula. No oversampling, no actual convolution, no post-processing, just a direct perfect result with only a few table lookups (or even directly using exp() and erf()) and basic operations for each pixel. But using the approach I use for rectangles of replacing Heaviside step functions with error functions only works if you have right angles. With any other kind of angles you get negative values that don't belong and corners that are wrong, and there's no simple obvious way to fix it, mostly if you take into consideration that the Gaussian function might sometimes be many times larger than your triangle.

Flat-top linear filtering for non-integer image scaling


This is what those peaks from a naive non-integer bilinear filtering look like.
This is a solution to the following problem: how do you properly but quickly scale down an image by a non-integer ratio? Say you have a 1024x1024 image, but you need it displayed at 700x700, which means you need it downscaled by a factor of 1.46x, not 2x or 3x. Upscaling is simple, you just interpolate between at least 4 pixels to make a new one, but downscaling needs more than that to properly filter/antialias the image. So if you wanted to downscale by a factor of 2x you could just double the span of your triangle-shaped convolution kernel. That works quite well, but not so well with non-integer ratios, particularly around 1.5x, this approach gives you big undesirable spikes in the result. Of course it does, since the Fourier transform of a triangle is a squared sinc function, which only has zero-crossings at integer multiples of the sampling frequency.


Linear filtering at varying downscaling ratios. The spikes in the sum are undesirable.

My solution: flat-top linear filtering. The sum is desirably much smoother.

Thankfully there's a simple modification we can bring to that triangle-shaped convolution kernel, and that's giving it a flat top. So now to calculate the weight of a pixel we need to use two of the following three parameters: the height of the flat top top, the position of the end of the flat top knee and the slope that comes after that knee, slope. For a scaling ratio of n and an absolute distance from the pixel of x, the weight can either be obtained by making x be at least the value of knee and then finding the height on the slope, or finding the height on the slope first then making sure it's not higher than top, and of course as always make sure to only look for weights in the [-n , n] range.

	x = fabs(x);

	// first method
	x = max(x, knee);	// this makes x be at least knee
	y = slope * (x-n);

	// second method
	y = slope * (x-n);
	y = min(y, top);	// this makes y (the weight) be no more than top

And here's how we calculate those 3 parameters, which only needs to be done once for a given scaling ratio of course.

	knee = 0.5 - fabs(fmod(n, 1.) - 0.5);		// the knee position ping pongs within [0 , 0.5] depending on n
	midpoint = floor(n+0.5);			// the mid-point of the current trough segment
	trough = 2.*midpoint - midpoint*midpoint/n;	// the height of the trough
	top = (1. - knee/n) / trough;			// the height of the flat top
	slope = -top / (n-knee);			// and the most important, the slope

So knee is easy to compute, which might make the first method more advantageous in a GPGPU context as it might be quicker to calculate it every time than to read it from global memory. It's a bit hard to explain how it's all computed, so I guess you shouldn't worry about it and just copy my code. Another solution to the same problem I used before this was to make a sum of every weight for each output pixel, then divide the pixel values by the sum of the weights, which is a less elegant solution as it adds more math for each pixel and does undesirable things on the edges of the image as the sum of the weights goes towards zero.

The purpose of this is for me to display images on screen using tiled mipmaps, the mipmap level right above the needed on-screen dimensions is selected, then displayed with the right dimensions using this flat top linear filtering. Note that this is good for every downscaling factor, but not for upscaling, for this you need to switch back to simple bilinear interpolation.

Contact

You can contact me by e-mail at