So I put off working on this entry long enough that it’s now two entries worth of data in one.
So, the implementation of Improved Perlin noise from GPU Gems 2 boils down to 48 pixel shader instruction slots (9 texture, 39 arithmetic). That’s one octave of noise. What I needed, desperately, was a faster implementation of noise, where the base quality doesn’t matter (especially useful for things such as fBm and the like).
In the FIRST GPU Gems, in the chapter on Improved Perlin Noise, Ken Perlin makes a quick note about how to make a cheap approximation of perlin noise in the shader, using a volume texture. The technique is straight forward, but it took me some effort to understand exactly what was supposed to go into the volume texture.
In my case, I ended up using a 32x32x32 volume texture to simulate an 8x8x8 looping sample of perlin noise space. Essentially, when sampling this texture, divide the world position by 8, and use that as the (wrapped) texcoord into the volume.
The first trick is that it has to be LOOPING Perlin noise. But how do you generate such a thing?
Turns out, in the reference implementation of Improved Noise, there are a bunch of instances where there are +1s. For instance:
A = p[X ]+Y; AA = p[A]+Z; AB = p[A+1]+Z; B = p[X+1]+Y; BA = p[B]+Z; BB = p[B+1]+Z;
(Later, AA, AB, BA, and BB are also accessed with +1s).
Figuring out how to make the noise wrap at a specific value (in my case, 8), was a matter of rethinking those as follows:
A = p[X ]; // note: no +Y here AA = p[A+Y] (+Z); // +Z in parens because it actually gets added later, like the Y does here AB = p[A+(Y+1)] (+Z); B = p[X+1]; // again, no +Y BA = p[B+Y] (+Z); BB = p[B+(Y+1)] (+Z);
So, really, the +1s are added to the coordinate added earlier.
So, to make the noise wrap at a certain value, you need to take those (coordinate+1)s and change each into a ((coordinate+1)%repeatLocation).
The final version of the texture shader that generates noise that loops at a specific location is as follows:
// permutation table
static int permutation[] = { 151,160,137,91,90,15,
131,13,201,95,96,53,194,233,7,225,140,36,103,30,69,142,8,99,37,240,21,10,23,
190, 6,148,247,120,234,75,0,26,197,62,94,252,219,203,117,35,11,32,57,177,33,
88,237,149,56,87,174,20,125,136,171,168, 68,175,74,165,71,134,139,48,27,166,
77,146,158,231,83,111,229,122,60,211,133,230,220,105,92,41,55,46,245,40,244,
102,143,54, 65,25,63,161, 1,216,80,73,209,76,132,187,208, 89,18,169,200,196,
135,130,116,188,159,86,164,100,109,198,173,186, 3,64,52,217,226,250,124,123,
5,202,38,147,118,126,255,82,85,212,207,206,59,227,47,16,58,17,182,189,28,42,
223,183,170,213,119,248,152, 2,44,154,163, 70,221,153,101,155,167, 43,172,9,
129,22,39,253, 19,98,108,110,79,113,224,232,178,185, 112,104,218,246,97,228,
251,34,242,193,238,210,144,12,191,179,162,241, 81,51,145,235,249,14,239,107,
49,192,214, 31,181,199,106,157,184, 84,204,176,115,121,50,45,127, 4,150,254,
138,236,205,93,222,114,67,29,24,72,243,141,128,195,78,66,215,61,156,180
};
// gradients for 3d noise
static float3 g[] = {
1,1,0,
-1,1,0,
1,-1,0,
-1,-1,0,
1,0,1,
-1,0,1,
1,0,-1,
-1,0,-1,
0,1,1,
0,-1,1,
0,1,-1,
0,-1,-1,
1,1,0,
0,-1,1,
-1,1,0,
0,-1,-1,
};
int perm(int i)
{
return permutation[i % 256];
}
float3 texfade(float3 t)
{
return t * t * t * (t * (t * 6 - 15) + 10); // new curve
// return t * t * (3 - 2 * t); // old curve
}
float texgrad(int hash, float3 p)
{
return dot(g[hash%16], p);
}
float texgradperm(int x, float3 p)
{
return texgrad(perm(x), p);
}
float texShaderNoise(float3 p, int repeat, int base = 0)
{
int3 I = fmod(floor(p), repeat);
int3 J = (I+1) % repeat.xxx;
I += base;
J += base;
p -= floor(p);
float3 f = texfade(p);
int A = perm(I.x);
int AA = perm(A+I.y);
int AB = perm(A+J.y);
int B = perm(J.x);
int BA = perm(B+I.y);
int BB = perm(B+J.y);
return lerp( lerp( lerp( texgradperm(AA+I.z, p + float3( 0, 0, 0) ),
texgradperm(BA+I.z, p + float3(-1, 0, 0) ), f.x),
lerp( texgradperm(AB+I.z, p + float3( 0, -1, 0) ),
texgradperm(BB+I.z, p + float3(-1, -1, 0) ), f.x), f.y),
lerp( lerp( texgradperm(AA+J.z, p + float3( 0, 0, -1) ),
texgradperm(BA+J.z, p + float3(-1, 0, -1) ), f.x),
lerp( texgradperm(AB+J.z, p + float3( 0, -1, -1) ),
texgradperm(BB+J.z, p + float3(-1, -1, -1) ), f.x), f.y), f.z);
}
Whee!
So, the second trick: the texture actually needed to contain two values (R and G channels), to act as real and imaginary parts. Very simple, I added a base parameter (in the code above) so that I could offset into a different 8x8x8 cube of noise. I drop a different 8x8x8 noise into the G channel.
Finally! We have a texture with 8x8x8 noise. But 8-cubed noise sucks, because it’s ridiculously repetative. That’s where that weird imaginary part comes into play. You sample the 8-cube volume again, but at 9x scale (so it’s lower frequency). You then use the (real component of) high-frequency as an angle (scaled by 2pi) to do a quaternion rotation on the low-frequency noise.
float noiseFast(float3 p)
{
p /= 8; // because the volume texture is 8x8x8 noise, divide the position by 8 to keep this noise in parity with the true Perlin noise generator.
float2 hi = tex3D(noise3dSampler, p).rg*2-1; // High frequency noise
half lo = tex3D(noise3dSampler, p/9).r*2-1; // Low frequency noise
half angle = lo*2.0*PI;
float result = hi.r * cos(angle) + hi.g * sin(angle); // Use the low frequency as a quaternion rotation of the high-frequency's real and imaginary parts.
return result; // done!
}
And that’s it! Compare the instruction counts of the real Perlin noise to this fast fake:
Old (high-quality): approximately 48 instruction slots used (9 texture, 39 arithmetic) New (lower-quality): approximately 20 instruction slots used (2 texture, 18 arithmetic)
Essentially, wherever I don’t need the full quality noise, I can halve my instruction count on noise generation. Score!
Here’s a comparison: on the left, the weird confetticrete chair with the original noise, and on the right is the new faster noise:


Old (left) vs. New (right)
Click to enlarge
They look roughly the same, there are some artifacts on the new one (the diamond-shaped red blob on the upper-right of the new chair due to the trilinear filtering), but it’s way faster.
Okay, I have some cool perlin noise stuff. But man cannot live on Perlin noise alone, so I decided to implement cellular noise, as well.
Turns out, there’s something called Worley noise which does exactly what I was hoping to do. Implementation was pretty simple.
void voronoi(float3 position, out float f1, out float3 pos1, out float f2, out float3 pos2, float jitter=.9, bool manhattanDistance = false )
{
float3 thiscell = floor(position)+.5;
f1 = f2 = 1000;
float i, j, k;
float3 c;
for(i = -1; i <= 1; i += 1)
{
for(j = -1; j <= 1; j += 1)
{
for(k = -1; k <= 1; k += 1)
{
float3 testcell = thiscell + float3(i,j,k);
float3 randomUVW = testcell * float3(0.037, 0.119, .093);
float3 cellnoise = perm(perm2d(randomUVW.xy)+randomUVW.z);
float3 pos = testcell + jitter*(cellnoise-.5);
float3 offset = pos - position;
float dist;
if(manhattanDistance)
dist = abs(offset.x)+abs(offset.y) + abs(offset.z);
else
dist = dot(offset, offset);
if(dist < f1)
{
f2 = f1;
pos2 = pos1;
f1 = dist;
pos1 = pos;
}
else if(dist < f2)
{
f2 = dist;
pos2 = pos;
}
}
}
}
if(!manhattanDistance)
{
f1 = sqrt(f1);
f2 = sqrt(f2);
}
}
The gist is that each unit cube cell has a randomly-placed point in it. for each point being evaluated by the shader, you find the distance to the nearest point (a value called “F1″), and the distance to the next-nearest (“F2″), etc (to as many as you care about – though anything past F4 starts to look similar and uninteresting). Using linear combinations of these distances gives interesting results:


Left: F1 Right: F2
Click to enlarge


Left: F2-F1 Right: (F1+F2)/2
Click to enlarge
Something cool to do, also, is to use Manhattan distance instead of standard Euclidian distance to calculate the distance. You end up with much more angular results. Here are the same 4 calculations, using manhattan distance:
Considering that a few levels of my current project will take place in a metallic fortress, this will especially come in handy.
So, what can you do with these?
I, predictably, have made a few test textures:
Also, it still looks pretty cool if you use fBm on it. For instance:
But I hear you asking “duz it wrok n 3deez, Drilian?!?!?!” Oh, I assure you it does!
And now I hear you asking “Can u stop typing nau? I is tir0d of reedin.” (or alternately, “I is tir0d uv looking @ imagez sparsely scattered thru the text taht I dun feel liek reedin.”) To this, I say: Sure, but it worries me that you’re asking your questions in some form of lolcat.
That’s all I got.
So yesterday I got the crack filling up and running.
Tonight, I improved the routine dramatically.
The problem was, the edge-expanding algorithm I used was detecting way more edges than it needed to. Here’s an image of a normal map generated using this (old, bad) method (I made it render ONLY the skirts, for illustration):
As you can see, way more edges through the UV charts were getting expanded than necessary. This was messing up the maps, because there were angles and edges where there didn’t need to be, and it was introducing artifacts, especially at lower mip levels.
The problem arose because each of those “extra” edges marked areas where the vertex positions were the same, but the texcoords were different. Since the original algorithm was using the vertex’s index as the identifying feature, each time there was a texcoord change meant that the indices for neighboring triangles were different, blah blah blah, you get the point.
Basically, the system was rewritten to glom together vertices with the same uv map coordinates, and treat them as one single vertex. All of those interior edges get discarded. Because a single “vertex” could actually be composed of multiple source vertices, the edge expanding code had to be modified to take that into account.
Here’s the old way again, followed by the NEW way (And then the new way completely filled in):
As you can see, they’re now proper outlines (not outandsometimesinlines), and the actual outer areas are much cleaner.
So, here’s a new render (and its diffuse map). I modified the concrete because I was sick of all of my pictures being grayscale, so here’s my artist’s rendition of “Gray Chair That A Clown Puked Onto”:
That’s all! I’m going to release the code that I’m using for all of this, but I want to clean it up just a bit, and add variable gutter width support (instead of the lame hardcoded way that I have it now).
But for now…away!
I took it easy today, so I was barely near the computer, but I did make some awesome progress.
Last night, I was able to finally get a prototype of my texture caching setup going.
Right now, it’s a command-line tool that does the following:
Simple enough. What I ended up with was as follows:
Not bad, but for two things:
Both are solvable problems, and I opted to tackle the normal mapping first.
How does one generate a normal map with a procedural function?
In my case, I have the procedural function not only generate a color but a height. Generating three heights in close proximity (using (pos), (pos+tangent*scaler), (pos+bitangent*scaler)) gives me two edges which I can take the cross-product of to get a pixel normal map. Adding this gave me some better shading (but didn’t fix the cracks):
The normal map generated is in object space (though it could easily be in world space, assuming a static object). This simplifies the lighting code (I simply transform the light position by the inverse world matrix before passing it to the shader) and eliminates the need for tangent and bitangent (yes, bitangent, not “binormal”) vectors.
Finally, it was time to solve the cracking problem. I decided to solve it by using skirts around the edges of the UV map sections. Essentially, they’re degenerate textures in the actual mesh (the positions are the same), but the UV coordinates are expanded to fill in some of the gapping.
Basically:
This basically puffs out each procedurally-generated area, as you can (maybe) see here (Easier to see at full size):
Thus, when the UV coordinates along the edges of these areas either go out of bounds or blend with the no-man’s-land around the texture, it blends with data that’s very close to what it’s near, hiding the cracks.
The result:
And that’s “all” there is to it!
The UV atlasifying and skirt generation will be a pre-process, so all of the vertex (mesh) data will be ready for immediate rendering into the texture after load.
Woot!
Return of the chairs!
I needed a quick test to make sure that the noise textures work well in 3D, since that’s their intended use, so I decided to run them on some chairs.
A few things to note:
That’s all!
PS – broken finger: still sucks.
This’ll be a short update. I came up with a better pavement texture, and, while trying for the stones, came up with a nice method of star generation, so I refined that as well. Hooray for happy accidents!
The stars one really looks best zoomed in (the thumbnail looks kinda lame), but I like them both!
All 100% pixel-shader generated. Both of these use pure improved perlin noise modifications to generate their look…no custom patterns like the brick and tile textures from earlier.
If you want to play around with the generator, the binary, code, and shaders are in a zip in the previous post. Have at it and let me know if you make anything awesome!
This entry was going to be a bit longer, but:

Yeah. It’s the classic tale of “boy meets girl, girl rejects boy,” except you replace “boy” with “finger,” “girl” with “wall,” and “rejects” with “breaks.”
Sometimes playing wallyball can be considered dangerous.
As part of the framework for the game I am currently writing, I’m going to have as much texture data as possible be procedural and cached in on the fly. There are a few reasons for this choice (many of which should be obvious):
I’m still working on the method, but here are a few examples:
These are all generated on-GPU, using ps_3_0 shaders. The noise implementation comes straight (thank you, copy-and-paste) from GPU Gems 2, which is an awesome book.
The idea is that objects (especially static world objects) will have unwrapped UV coordinates (like you’d use for lightmaps). To generate the textures onto the objects, I’ll do the following:
Should be pretty easy. Obviously, there are some patterns that are ridiculously difficult or even maybe impossible to generate efficiently on the GPU, so I’ll probably still use some pre-made texturemaps. But as much as I possibly can do on the GPU, I will. The main gotcha will be keeping the amount of texture info that needs to be generated to a minimum, so there aren’t any render stalls. That’s more of a level design/art problem though (which, because this is being developed lone-wolf, is also my problem).
If you want to see the shaders I’ve used and the code that I used, here is my sample app (with full source):
ProcTexGen.zip – 29KB.
The source is ridiculously uncommented because I coded it over the span of maybe 3 hours as a quick prototype, and the shaders are, I’m sure, nowhere near efficient. Also, they don’t handle negative values very well, which is why many of them add 100 to the coordinates (HACKHACKHACK).
Enjoy! And if you make any awesome textures with it, please let me know