Josh Jersild – Page 4 – JoshJers' Ramblings

Slow Progress Is Still Progress

I’ve gotten less done recently that I would have liked, due to a lack of time to sit down at my computer. However, I did implement tiling perlin noise in-shader.

It uses the same basic technique as I used to set up the tiling noise (so you can read it in one of my earlier posts).

Basically, I can generate perlin noise that tiles at any given (integer) position.

Quite handy for generating tiling textures (because not everything I’m generating needs to be 3d)

Here are some examples. It’s hard to tell that they tile without actually tiling them yourself, but they do. So there.

Click to enlarge

So I used it to put together a seamless version of my earlier brick texture:

Click to enlarge

Finally, I did the same basic trick with the Worley (cellular) noise.

The first screenshot is a large area repeating Worley, the second repeats at a very small level so it should be obvious how it tiles, even looking at the thumbnail:

Click to enlarge

Next up: Maybe I should actually do something useful with these things.

Pumpkins: Great For Pie, Great For Faces

I haven’t had time to do any more work on my project the last couple days, but I did have time to carve some faces on some pumpkins. My wife requested that the rounded one be a happy face, so I got to make some sort of surprised “OMGWTFBBQ” face on the second.

Click to enlarge

kekeke

Jpeg Buoys Amidst a Sea of Text

So I put off working on this entry long enough that it’s now two entries worth of data in one.

Too Many Instructions: Cutting Down On the Noise

So, the implementation of Improved Perlin noise from GPU Gems 2 boils down to 48 pixel shader instruction slots (9 texture, 39 arithmetic). That’s one octave of noise. What I needed, desperately, was a faster implementation of noise, where the base quality doesn’t matter (especially useful for things such as fBm and the like).

In the FIRST GPU Gems, in the chapter on Improved Perlin Noise, Ken Perlin makes a quick note about how to make a cheap approximation of perlin noise in the shader, using a volume texture. The technique is straight forward, but it took me some effort to understand exactly what was supposed to go into the volume texture.

In my case, I ended up using a 32x32x32 volume texture to simulate an 8x8x8 looping sample of perlin noise space. Essentially, when sampling this texture, divide the world position by 8, and use that as the (wrapped) texcoord into the volume.

Crazy 8s: Modifying Perlin Noise To Loop At A Specified Location

The first trick is that it has to be LOOPING Perlin noise. But how do you generate such a thing?

Turns out, in the reference implementation of Improved Noise, there are a bunch of instances where there are +1s. For instance:

A = p[X  ]+Y;
AA = p[A]+Z;
AB = p[A+1]+Z;

B = p[X+1]+Y;
BA = p[B]+Z;
BB = p[B+1]+Z;

(Later, AA, AB, BA, and BB are also accessed with +1s).

Figuring out how to make the noise wrap at a specific value (in my case, 8), was a matter of rethinking those as follows:

A = p[X  ]; // note: no +Y here
AA = p[A+Y]  (+Z); // +Z in parens because it actually gets added later, like the Y does here
AB = p[A+(Y+1)] (+Z);

B = p[X+1]; // again, no +Y
BA = p[B+Y] (+Z);
BB = p[B+(Y+1)] (+Z);

So, really, the +1s are added to the coordinate added earlier.
So, to make the noise wrap at a certain value, you need to take those (coordinate+1)s and change each into a ((coordinate+1)%repeatLocation).

The final version of the texture shader that generates noise that loops at a specific location is as follows:

// permutation table
static int permutation[] = { 151,160,137,91,90,15,
131,13,201,95,96,53,194,233,7,225,140,36,103,30,69,142,8,99,37,240,21,10,23,
190, 6,148,247,120,234,75,0,26,197,62,94,252,219,203,117,35,11,32,57,177,33,
88,237,149,56,87,174,20,125,136,171,168, 68,175,74,165,71,134,139,48,27,166,
77,146,158,231,83,111,229,122,60,211,133,230,220,105,92,41,55,46,245,40,244,
102,143,54, 65,25,63,161, 1,216,80,73,209,76,132,187,208, 89,18,169,200,196,
135,130,116,188,159,86,164,100,109,198,173,186, 3,64,52,217,226,250,124,123,
5,202,38,147,118,126,255,82,85,212,207,206,59,227,47,16,58,17,182,189,28,42,
223,183,170,213,119,248,152, 2,44,154,163, 70,221,153,101,155,167, 43,172,9,
129,22,39,253, 19,98,108,110,79,113,224,232,178,185, 112,104,218,246,97,228,
251,34,242,193,238,210,144,12,191,179,162,241, 81,51,145,235,249,14,239,107,
49,192,214, 31,181,199,106,157,184, 84,204,176,115,121,50,45,127, 4,150,254,
138,236,205,93,222,114,67,29,24,72,243,141,128,195,78,66,215,61,156,180
};

// gradients for 3d noise
static float3 g[] = {
    1,1,0,
    -1,1,0,
    1,-1,0,
    -1,-1,0,
    1,0,1,
    -1,0,1,
    1,0,-1,
    -1,0,-1,
    0,1,1,
    0,-1,1,
    0,1,-1,
    0,-1,-1,
    1,1,0,
    0,-1,1,
    -1,1,0,
    0,-1,-1,
};

int perm(int i)
{
	return permutation[i % 256];
}

float3 texfade(float3 t)
{
	return t * t * t * (t * (t * 6 - 15) + 10); // new curve
//	return t * t * (3 - 2 * t); // old curve
}

float texgrad(int hash, float3 p)
{
  return dot(g[hash%16], p);
}

float texgradperm(int x, float3 p)
{
	return texgrad(perm(x), p);
}

float texShaderNoise(float3 p, int repeat, int base = 0)
{
	int3 I = fmod(floor(p), repeat);
	int3 J = (I+1) % repeat.xxx;
	I += base;
	J += base;

  p -= floor(p);

  float3 f = texfade(p);

	int A  = perm(I.x);
	int AA = perm(A+I.y);
	int AB = perm(A+J.y);

 	int B  =  perm(J.x);
	int BA = perm(B+I.y);
	int BB = perm(B+J.y);

  	return lerp( lerp( lerp( texgradperm(AA+I.z, p + float3( 0,  0,  0) ),
                                 texgradperm(BA+I.z, p + float3(-1,  0,  0) ), f.x),
                           lerp( texgradperm(AB+I.z, p + float3( 0, -1,  0) ),
                                 texgradperm(BB+I.z, p + float3(-1, -1,  0) ), f.x), f.y),
                     lerp( lerp( texgradperm(AA+J.z, p + float3( 0,  0, -1) ),
                                 texgradperm(BA+J.z, p + float3(-1,  0, -1) ), f.x),
                           lerp( texgradperm(AB+J.z, p + float3( 0, -1, -1) ),
                                 texgradperm(BB+J.z, p + float3(-1, -1, -1) ), f.x), f.y), f.z);

}

Whee!

Noise + Real Numbers + Imaginary Numbers == ???

So, the second trick: the texture actually needed to contain two values (R and G channels), to act as real and imaginary parts. Very simple, I added a base parameter (in the code above) so that I could offset into a different 8x8x8 cube of noise. I drop a different 8x8x8 noise into the G channel.

Finally! We have a texture with 8x8x8 noise. But 8-cubed noise sucks, because it’s ridiculously repetative. That’s where that weird imaginary part comes into play. You sample the 8-cube volume again, but at 9x scale (so it’s lower frequency). You then use the (real component of) high-frequency as an angle (scaled by 2pi) to do a quaternion rotation on the low-frequency noise.

float noiseFast(float3 p)
{
  p /= 8; // because the volume texture is 8x8x8 noise, divide the position by 8 to keep this noise in parity with the true Perlin noise generator.
  float2 hi = tex3D(noise3dSampler, p).rg*2-1; // High frequency noise
  half   lo = tex3D(noise3dSampler, p/9).r*2-1; // Low frequency noise

  half  angle = lo*2.0*PI;
  float result = hi.r * cos(angle) + hi.g * sin(angle); // Use the low frequency as a quaternion rotation of the high-frequency's real and imaginary parts.
  return result; // done!
}

And that’s it! Compare the instruction counts of the real Perlin noise to this fast fake:

Old (high-quality):  approximately 48 instruction slots used (9 texture, 39 arithmetic)
New (lower-quality): approximately 20 instruction slots used (2 texture, 18 arithmetic)

Essentially, wherever I don’t need the full quality noise, I can halve my instruction count on noise generation. Score!

Here’s a comparison: on the left, the weird confetticrete chair with the original noise, and on the right is the new faster noise:

Old (left) vs. New (right)
Click to enlarge

They look roughly the same, there are some artifacts on the new one (the diamond-shaped red blob on the upper-right of the new chair due to the trilinear filtering), but it’s way faster.

Cellular Noise

Okay, I have some cool perlin noise stuff. But man cannot live on Perlin noise alone, so I decided to implement cellular noise, as well.

Turns out, there’s something called Worley noise which does exactly what I was hoping to do. Implementation was pretty simple.

void voronoi(float3 position, out float f1, out float3 pos1, out float f2, out float3 pos2, float jitter=.9, bool manhattanDistance = false )
{
  float3 thiscell = floor(position)+.5;
  f1 = f2 = 1000;
  float i, j, k;

  float3 c;
  for(i = -1; i <= 1; i += 1)
  {
    for(j = -1; j <= 1; j += 1)
    {
      for(k = -1; k <= 1; k += 1)
      {
        float3 testcell = thiscell  + float3(i,j,k);
        float3 randomUVW = testcell * float3(0.037, 0.119, .093);
        float3 cellnoise = perm(perm2d(randomUVW.xy)+randomUVW.z);
        float3 pos = testcell + jitter*(cellnoise-.5);
        float3 offset = pos - position;
        float dist;
        if(manhattanDistance)
          dist = abs(offset.x)+abs(offset.y) + abs(offset.z);
        else
          dist = dot(offset, offset);
        if(dist < f1)
        {
          f2 = f1;
          pos2 = pos1;
          f1 = dist;
          pos1 = pos;
        }
        else if(dist < f2)
        {
          f2 = dist;
          pos2 = pos;
        }
      }
    }
  }
  if(!manhattanDistance)
  {
    f1 = sqrt(f1);
    f2 = sqrt(f2);
  }
}

The gist is that each unit cube cell has a randomly-placed point in it. for each point being evaluated by the shader, you find the distance to the nearest point (a value called “F1”), and the distance to the next-nearest (“F2”), etc (to as many as you care about – though anything past F4 starts to look similar and uninteresting). Using linear combinations of these distances gives interesting results:

Left: F1 Right: F2
Click to enlarge

Left: F2-F1 Right: (F1+F2)/2
Click to enlarge

Something cool to do, also, is to use Manhattan distance instead of standard Euclidian distance to calculate the distance. You end up with much more angular results. Here are the same 4 calculations, using manhattan distance:

Click to enlarge

Considering that a few levels of my current project will take place in a metallic fortress, this will especially come in handy.

So, what can you do with these?

I, predictably, have made a few test textures:

Click to enlarge

Also, it still looks pretty cool if you use fBm on it. For instance:

4 octaves of F1 Worley noise

But I hear you asking “duz it wrok n 3deez, Drilian?!?!?!” Oh, I assure you it does!

Click to enlarge

And now I hear you asking “Can u stop typing nau? I is tir0d of reedin.” (or alternately, “I is tir0d uv looking @ imagez sparsely scattered thru the text taht I dun feel liek reedin.”) To this, I say: Sure, but it worries me that you’re asking your questions in some form of lolcat.

That’s all I got.

Short Skirt, Long Jacket

So yesterday I got the crack filling up and running.

Tonight, I improved the routine dramatically.

The Trouble With Texcoords

The problem was, the edge-expanding algorithm I used was detecting way more edges than it needed to. Here’s an image of a normal map generated using this (old, bad) method (I made it render ONLY the skirts, for illustration):

Click to enlarge

As you can see, way more edges through the UV charts were getting expanded than necessary. This was messing up the maps, because there were angles and edges where there didn’t need to be, and it was introducing artifacts, especially at lower mip levels.

The problem arose because each of those “extra” edges marked areas where the vertex positions were the same, but the texcoords were different. Since the original algorithm was using the vertex’s index as the identifying feature, each time there was a texcoord change meant that the indices for neighboring triangles were different, blah blah blah, you get the point.

UV: Vectors, Not Rays

Basically, the system was rewritten to glom together vertices with the same uv map coordinates, and treat them as one single vertex. All of those interior edges get discarded. Because a single “vertex” could actually be composed of multiple source vertices, the edge expanding code had to be modified to take that into account.

Here’s the old way again, followed by the NEW way (And then the new way completely filled in):

Click to enlarge

As you can see, they’re now proper outlines (not outandsometimesinlines), and the actual outer areas are much cleaner.

I Don’t Think That Clown Is Healthy

So, here’s a new render (and its diffuse map). I modified the concrete because I was sick of all of my pictures being grayscale, so here’s my artist’s rendition of “Gray Chair That A Clown Puked Onto”:

Click to enlarge

That’s all! I’m going to release the code that I’m using for all of this, but I want to clean it up just a bit, and add variable gutter width support (instead of the lame hardcoded way that I have it now).

But for now…away!

The Big Procedural Easy

I took it easy today, so I was barely near the computer, but I did make some awesome progress.

Last night, I was able to finally get a prototype of my texture caching setup going.

Diffusing the Procedural Situation Using Bad Puns

Right now, it’s a command-line tool that does the following:

Loads up a mesh and UV atlases it to get unique texture coordinates for the entire mesh (similar to what you’d do for lightmapping
Loads a D3DX effect
Renders the mesh into the a render target, using the UV atlas texcoords as position, using the actual model’s position/normal as shader inputs to generate the noise
Writes both the UV atlased mesh and the rendered texture to file

Simple enough. What I ended up with was as follows:

Click to enlarge

Not bad, but for two things:

No normal mapping (per-vertex normals only)
Cracks along the seams of the UV maps.

Both are solvable problems, and I opted to tackle the normal mapping first.

Returning to Normalcy

How does one generate a normal map with a procedural function?

In my case, I have the procedural function not only generate a color but a height. Generating three heights in close proximity (using (pos), (pos+tangent*scaler), (pos+bitangent*scaler)) gives me two edges which I can take the cross-product of to get a pixel normal map. Adding this gave me some better shading (but didn’t fix the cracks):

Click to enlarge

The normal map generated is in object space (though it could easily be in world space, assuming a static object). This simplifies the lighting code (I simply transform the light position by the inverse world matrix before passing it to the shader) and eliminates the need for tangent and bitangent (yes, bitangent, not “binormal”) vectors.

Cracks are Unappealing on Plumbers AND Procedurally-Textured Models

Finally, it was time to solve the cracking problem. I decided to solve it by using skirts around the edges of the UV map sections. Essentially, they’re degenerate textures in the actual mesh (the positions are the same), but the UV coordinates are expanded to fill in some of the gapping.

Basically:

Use your favorite method to get a list of edges that are only used once
Use these edges to generate “UV normals” for each vertex (which has two edges, one leading in and one leading out), which are basically ( perpendicular[(edge+edge2)/2] ).
duplicate each vertex, move its UV coordinate some distance along this uv normal
Create new strips of indices, using the old and new
render these into the UV map first, before rendering the standard data

This basically puffs out each procedurally-generated area, as you can (maybe) see here (Easier to see at full size):

Click to enlarge

Thus, when the UV coordinates along the edges of these areas either go out of bounds or blend with the no-man’s-land around the texture, it blends with data that’s very close to what it’s near, hiding the cracks.

The result:

Click to enlarge

And that’s “all” there is to it!

The UV atlasifying and skirt generation will be a pre-process, so all of the vertex (mesh) data will be ready for immediate rendering into the texture after load.

Woot!

Sometimes They Come Back

Return of the chairs!

I needed a quick test to make sure that the noise textures work well in 3D, since that’s their intended use, so I decided to run them on some chairs.

A few things to note:

These aren’t currently being written to texture first (which is the ultimate idea) so the chairs in the background have a certain amount of…sparkle.
Also, this means that this is SLOW. This scene brought my 8800GTX to its little silicon knees. The concrete shader is 1906 instructions (way over the sm3 guaranteed min spec of 512), 306 texture and 1600 arithmetic, so it’s a bit…intense.
The lighting looks a little weird. I’m not sure if it’s an artifact or if it’s right and I’m just imagining things, but there you go.

Click to enlarge

That’s all!

PS – broken finger: still sucks.

More Textures!

This’ll be a short update. I came up with a better pavement texture, and, while trying for the stones, came up with a nice method of star generation, so I refined that as well. Hooray for happy accidents!

The stars one really looks best zoomed in (the thumbnail looks kinda lame), but I like them both!

Click to enlarge

All 100% pixel-shader generated. Both of these use pure improved perlin noise modifications to generate their look…no custom patterns like the brick and tile textures from earlier.

If you want to play around with the generator, the binary, code, and shaders are in a zip in the previous post. Have at it and let me know if you make anything awesome!

Procedual Terseness

The Most Communicative Of Fingers

This entry was going to be a bit longer, but:

OWW MY LMF ASPLODE!

Yeah. It’s the classic tale of “boy meets girl, girl rejects boy,” except you replace “boy” with “finger,” “girl” with “wall,” and “rejects” with “breaks.”

Sometimes playing wallyball can be considered dangerous.

Procedural Textures

As part of the framework for the game I am currently writing, I’m going to have as much texture data as possible be procedural and cached in on the fly. There are a few reasons for this choice (many of which should be obvious):

Less disk usage – very useful if I hit my target of, oh, say, a certain game console
Non-repeating textures – textures don’t have to tile. I can keep caching in new ones.
Seriously, I suck at texture art – This way, the computer does it for me!

I’m still working on the method, but here are a few examples:

Click to enlarge

These are all generated on-GPU, using ps_3_0 shaders. The noise implementation comes straight (thank you, copy-and-paste) from GPU Gems 2, which is an awesome book.

The idea is that objects (especially static world objects) will have unwrapped UV coordinates (like you’d use for lightmaps). To generate the textures onto the objects, I’ll do the following:

Create a texture that is the requisite size (or pull it out of a pool, which is more likely
Render the objects into the texture, using the UV coordinates as the position (scaled from [0,1] to [-1,1] of course).
Pass the position and/or normal to the pixel shader, use it to generate the texture data
Repeat for as many textures as the object needs (some combination of diffuse color, normal, height, glossiness, etc).

Should be pretty easy. Obviously, there are some patterns that are ridiculously difficult or even maybe impossible to generate efficiently on the GPU, so I’ll probably still use some pre-made texturemaps. But as much as I possibly can do on the GPU, I will. The main gotcha will be keeping the amount of texture info that needs to be generated to a minimum, so there aren’t any render stalls. That’s more of a level design/art problem though (which, because this is being developed lone-wolf, is also my problem).

If you want to see the shaders I’ve used and the code that I used, here is my sample app (with full source):

ProcTexGen.zip – 29KB.

The source is ridiculously uncommented because I coded it over the span of maybe 3 hours as a quick prototype, and the shaders are, I’m sure, nowhere near efficient. Also, they don’t handle negative values very well, which is why many of them add 100 to the coordinates (HACKHACKHACK).

Enjoy! And if you make any awesome textures with it, please let me know 🙂

D3D10 + Deferred Shading = MSAA!

For the last few days I’ve been working at learning D3D10 and using it to whip up a quick prototype of doing fully-deferred shading while using MSAA (multisample antialiasing). If you’re not sure what deferred shading is, but are curious, check out the deferred shading paper at NVIDIA’s developer site. Here are my notes on the experience.

On D3D10

D3D10 is very, very well-designed. It has some interesting new bits of functionality, but the way that the API has been rearranged is really quite nice, though it does tend to produce MUCH more verbose code.

One thing in particular that is very different is the buffer structure. rather than creating a buffer of a specific type (IDirectDrawVertexBuffer9, etc), you simply create a generic buffer (ID3D10Buffer). When you create a buffer, you specify the flags with which it can be bound (as a vertex buffer, index buffer, render target, constant buffer [another new feature], etc).

For instance, here’s my helper function to create a vertex buffer:

HRESULT CreateVertexBuffer(void *initialData, DWORD size, ID3D10Buffer** vb)
{
  D3D10_BUFFER_DESC bd;
  bd.Usage = D3D10_USAGE_IMMUTABLE; //  This tells it that the buffer will be filled with data on initialization and never updated again.
  bd.ByteWidth = size;
  bd.BindFlags = D3D10_BIND_VERTEX_BUFFER; // This tells it that it will be used as a vertex buffer
  bd.CPUAccessFlags = 0;
  bd.MiscFlags = 0;

  // Pass the pointer to the vertex data into the creation
  D3D10_SUBRESOURCE_DATA vInit;
  vInit.pSysMem = initialData;

  return g_device->CreateBuffer( &bd, &vInit, vb);
}

It’s pretty straightforward, but you can see that it’s a tad more verbose than a single-line call to IDirect3DDevice9::CreateVertexBuffer.

Another thing that I really like is the whole constant buffer idea. Basically, when passing state to shaders, rather than setting individual shader states, you build constant buffers, which you apply to constant buffer slots (15 buffer slots that can hold 4096 constants each – which adds up to a crapton of constants). So you can have different constant blocks that you can Map/Unmap (the D3D10 version of Lock/Unlock) to write data into, and you can update them based on frequency. For instance, I plan to have a cblocks that are per-world, per-frame, per-material, and per-object.

But the feature that’s most relevant to this project is this little gem:
You can read individual samples from a multisample render target.
This is what allows you to do deferred shading with true multisample anti-aliasing in D3D10.

The only thing that really, really sucks about D3D10 is the documentation. It is missing a lot of critical information, some of the function definitions are wrong, sample code has incorrect variable names, etc, etc. It’s good at giving a decent overview, but when you start to drill into specifics, there’s still a lot of work to be done.

SV_Position: What Is It Good For (Absolutely Lots!)

SV_Position is the D3D10 equivalent of the POSITION semantic: it’s what you write out of your vertex shader to set the vertex position.

However, you can also use it in a pixel shader. But what set of values does it contain when it reaches the pixel shader? The documentation was (unsurprisingly) not helpful in determining this.

Quite simply, it gives you viewport coordinates. That is, x and y will give you the absolute coordinates of the current texel you’re rendering in the framebuffer (if your framebuffer is 640×480, then a SV_Position.xy in the middle would be (320×240)).

The Z coordinate is a viewport Z coordinate (if your viewport’s MinZ is 0.5 and your MaxZ is 1, then this z coordinate will be confined to that range as well).

The W coordinate I’m less sure about – it seemed to be the (interpolated) w value from the vertex shader, but I’m not positive on that.

I thought this viewport-coordinate thing was a tad odd…I mean, who cares which absolute pixel you’re at on the view? Why not just give me a [0..1] range? As it turns out, when sampling multisample buffers, you actually DO care, because you don’t “sample” them. You “load” them.

Doing a texture Load does not work quite like doing a texture Sample. Load takes integer coordinates that correspond to the absolute pixel value to read. Load is also the only way to grab a specific sample out of the pack.

But, in conjunction with our delicious SV_Position absolute-in-the-render-target coordinates, you have exactly the right information!

Pulling a given sample out of the depth texture is as easy as:

int sample; // this contains the index of the sample to load.  If this is a 4xAA texture, then sample is in the range [0, 3].
VertexInput i; // i.position is the input SV_Position.  It contains the absolute pixel coordinates of the current render.
texture2DMS<float, NUMSAMPLES> depthTexture; // This is the depth texture - it's a 2D multi-sample texture, defined as
                                             // having a single float, and having NUMSAMPLES samples

// Here's the actual line of sampling code
float  depth   = depthTexture.Load(int3((int2)i.position.xy, 0), sample).x;

Simple! I do believe it is for exactly this type of scenario (using Load to do postprocess work) that SV_Position in the PS was designed the way it is. Another mystery of the universe solved. Next on the list: “What makes creaking doors so creepy?”

Workin’ It

Simply running the deferred algorithm for each sample in the deferred GBuffers’ current texel and averaging them together works just fine. That gets you the effect with a minimum of hassle. But I felt that it could be optimized a bit.

The three deferred render targets that get used in this demo are the unlit diffuse color buffer (standard A8R8G8B8), the depth render (R32F), and the normal map buffer (A2R10G10B10). The depth render is not necessary in D3D10 when there is no multisampling, because you can read from a non-ms depth buffer in D3D10. However, you can’t map a multisampled depth buffer as a texture, so I have to render depth on my own.

Anyway, I wanted to have a flag that denoted whether or not a given location’s samples were different or not. That is, if it’s along a poly edge, the samples are probably different. But, due to the nature of multisampling, if a texel is entirely within a polygon’s border, all of the samples will contain the same data. There is really no need to do deferred lighting calculations on MULTIPLE samples when one would do just fine. So I added a pass that runs through each pixel and tests the color and depth samples for differences. If there ARE differences, it writes a 1 to the previously-useless 2-bit alpha channel in the normal map buffer. Otherwise, it writes a 0.

What this does, is allows me to selectively decide whether to do the processing on multiple samples (normalMap.w == 1) or just a single one (normalMap.w == 0).

Here is a visualization:

Click to enlarge

I’ve tinted it red where extra work is done (the shading is done per-sample) and blue where shading is only done once.

This didn’t have the massive performance boost that I was expecting – I figured having a single pass through almost all samples then only loading them as-needed would save massive amounts of texture bandwidth during the lighting phase, as well as cutting down on the processing.

I was half-right.

In fact, the performance boost was much smaller than expected. The reason is, I’ve guessed, is that when caching the multisample texture, it caches all of the samples (because it’s likely that they’ll all be read under normal circumstances), so it really doesn’t cut down on the memory bandwidth at all. What it DOES cut down on is the processing which, as the lighting gets more complex (shadowing is added, etc), WILL become important. Also, since my shader is set up to be able to do up to 8 lights in a single pass, It renders 25 full-scene directional lights (in…4 passes) at about 70fps with 4xAA at 1280×964 (maximized window, so not the whole screen) on my 8800GTX. As a comparison, it’s about 160 fps without the AA.

With a more reasonable 4 lights (single-pass) it’s 160fps at that resolution with AA, and 550 without. Not bad at all!

Here are two screenshots, one with AA, one without (respectively). Note that they look exactly the same in thumbnails. I could have probably used the same thumbnail for them, but whatever 🙂

Click to enlarge

And here it is!

Crappy D3D10 Deferred-with-AA demo (with hideous, hideous source!)

Pressing F5 will toggle the AA on and off (it just uses 4xAA). It defaults to off.

The Topics Grew In Fields

A few things to talk about this entry (no screenshots, but there is an MP3 later):

Cube Farm

Cube farm is declared complete. I decided not to add any polish to it, because it was simply a test of my game development framework. It gave me a large list of things that I need to work on (better UI layout control, ability to put 3D objects into the UI, menuing systems, better input setup, etc) before moving on to the next game. Which brings us to…

Next Project

In keeping with my Konami-inspired game development theme, the next game that I’m planning will be a side-scrolling space shoot-em-up similar to Gradius.

The plan, however, is to have all of the entities and backgrounds be full 3D and make use of some of the higher-level features of my framework (some of which are actually implemented). These features include (but are not limited to):

Procedurally-generated textures (on-the-fly, cached in, no tiling)
instancing
Multi-core support
Procedural geometry

New Song

I was finally able to break through my composer’s block and get something (ANYTHING!) composed. I like it, though it’s a bit longer than it needs to be. Oh well, that I got anything written at all is good enough right now. I’ve been on a dry streak since the end of Mop of Destiny.

MP3: Longing – 4:31 (6.5 MB)