The subject today is texture compression. Specifically, how we can abuse it for heightmaps.
As you may recall from my other thread on the subject, the render pipeline is getting pretty full of various drawing methods, each with its own strengths and weaknesses. At the lowest leve, we have polygon drawing like normal, which starts out as blank and gets textured in a deferred process along with lighting. Next we have two-sided heightmaps. These guys can represent voxel data as it shows up most of the time in the natural world--a low complexity blob of connected blocks. You can stack them together within a chunk to reproduce virtually any geometry, though most of the time we would only be dealing with one or two per chunk. Finally, we have our conventional heightmaps, which represent the world by gross elevation alone and can't handle the concavities occasionally present. What it can do, however, is scale. Each patch of this large-scale heightmap is the same resolution, but the area they cover can change, all the way from direct block representation, all the way up to a single patch covering a whole face of the planet. And then as more data about the planet streams in, more and more detail can be added. Not only is this good for streaming, but also scaling the scene with the capabilities of the host GPU which may or may not have much memory to work with. In theory, this system should be able to provide however much detail your machine can comfortably manage without going over.
Now, how these heightmaps work, as I've described before, is that you start with a ray at each pixel in a bounding box around the heightmap. From each ray, you check the current height against the height marked in the heightmap texture and moving across the field until you cross the height field. Then each ray returns where in the scene it it, and which block face that is. Because this is done on a per pixel basis, the amount of terrain on the screen doesn't affect the performance, which is why raycasters are quite handly. But this is where things get interesting. The horizontal resolution is dependent on the scale of the texture to real-world coordinates, and will always be in powers of two. The height, on the other hand doesn't have to be tied to anything. Since rays are either above or below the height, it doesn't matter if the increments are to scale, or even if they are whole numbers; the renderer will still produce correct looking block columns for whatever that height happens to be.
This opens some new doors. For starters, we don't need a 16-bit value to perfectly track all the heights. Instead we could put each patch in a bounding box spaning the highest and lowest elevations within that patch. How much height difference would you normally expect between a couple of chunks? 20m? 40m? Even more importantly, natural terrain tends to be smooth once you get far enough out. This is important, because it means we can store the heightmaps to a DXTC texture format.
--------
So what's a DXTC format, you ask? It's a type of lossy compressed image, like jpeg, only this one can be stored and read natively on pretty much every graphics card in existence.
Here's an example of DXT1, borrowed from the Wolfire blog.

Takes you back to the 16-bit era of gaming, just a bit, but overall, almost the same image.
Now here's the raw image reduced to 128 KB via resolution reduction alone.

So DXT1 is a rather good way to shrink your video memory requirements. In this case, it had 6x compression. And, with our drawing power limited to the amount of texture we can store, that is a very good thing.
DXT1 works on 4x4 pixel blocks. Each block has two colors, C0 and C1. The rest of the block is represented by 2 bits per pixel, for a total of 4 modes each. In one setting, mode 0 means the pixel is color C0. Mode 1 = C1. Mode 3 = halfway between C0 and C1, and mode 4 = pure black. By reversing the order of C1 and C0, you could have an entirely different set of values, in this case, mode 3 and 4 are at 1/3 and 2/3 between C0 and C1. The order chosen is whatever is most accurate for that block, and as you can see, it works pretty well.
But we're not going after color. We're here for heightmaps. Specifically, we are looking for at least 8 bits of data (256 increments) to be compressed. What we need is DXT5.
DXT5 has two sections, the first is the color section, nearly identical to DXT1. The second is the alpha section. Again, there are two bounds, A0 and A1, which are both 8-bit, only this time the block maps 3 bits per pixel, for a total of 8 modes. In the first setting, based on the order of A0 and A1, modes 3-8 are at evenly distributed points between A0 and A1. With the reverse setting, modes 3-6 are even increments (slightly larger ones, as there are fewer points). Modes 7 and 8 are 0 and 255, regardless of what A0 and A1 are bounded on.
As the elevation in natural terrain is mostly smooth, especially as you get further away, this alpha is a good fit for height bounds. The less change in height of each 4x4 patch, the more accurate the heightmap becomes. We'll also include two full-sized floating point bounds, which set the world height for 0 and 255, and can be wherever is most optimal and produces the fewest cracks between heightmap chunks. This may sound like a lot of set-up, and there is a bit to do, but the great thing about this is that there's no rush. It can be done at any point on the server and then get downloaded to clients as-needed. The color channel can be used to color in each column with the dominant visible block color. If we want to have a secondary color for the top block, like trees or grass, we need to apply a second DXT1 texture.
-----------
So that is what I want to use with raycasting on the heightmap model. Now, color+height together is a mere 8 bits/pixel and covers the entire column for that side of the planet. Plus, we can still scale with resolution. But here we run into a problem. Before I mentioned a method of using lower resolution versions of the heightmap to declare safe regions for ray traversial where it is guaranteed not to hit anything. It's called an acceleration structure and it's the secret to raytracing over large heightmaps. Unfortunately, that no longer works if you add DXT5, because the values have error in them, and the modifiers are all different. In the 2-sided heightmaps, I was assuming that if I could get the entire heightmap patch loaded into each multiprocessor's texture cache, the access time for each pixel is so fast that we don't even need to bother with an acceleration structure. With these full-scale heightmaps, that might not be the case. These texture patches are going to be larger than 32x32, and have full-sized bounding boxes, so an acceleration structure is much more important.
The average texture cache is going to be 4-8kb, so we should assume 4kb with whatever we do. DXT5 puts 4x4 pixels in 128 bits, so a 32x32 texture patch would take up 1024 bytes, or 1536 bytes if you are counting the secondary color, which we probably should. 64x64 would take us to 4kb without the secondary color. Until the triangle is drawn, nothing will be using that shader unit's resources except for other parts of the same triangle. What makes this interesting is that most of the time, a single triangle isn't going to explore the entire heightmap, so we have a little bit of wiggle room. With any luck, we can probably go a bit over the texture cache limit and stick with 64x64 pixel textures and the secondary color. The next level up though, would blow out even an 8kb texture cache, so we'll probably stick to this. For a 5km planet, that's still ~6400 texture patches per face, so we're almost there. But add the fact that we don't need to display anywhere near 1m resolution for the entire planet when we're out in orbit, and we finally have something which can take us all the way to space!
I'm not very happy with the loss of the acceleration structure, especially when the bounding box can have a ton of empty space. But the decreased resolution maps should help keep the total raycasting time down. With hardware tessellation, DX11 cards can drop rays right in front of the spot on the correct heightmap, at which point the rays are just there to turn the mesh into blocks, so that's good, but it's not going to help anyone with average gpus, who unfortunately are the ones who need it most. We could make a second mesh which generates a closer starting point for each ray, but we need to be careful not to put down too much detail or we could still swamp a weak gpu. Also, far enough away, and we can remove the raycasting altogether and make the planet look flat. Then it's almost nothing to render. By generating a normal map from the heightmap, we can use deferred lighting to give it an indication of depth, though without the self-occlusion.
--------
Anyway, just thought I would report this. Until now, the question has been how are we going to see planets from space? Now the question is how much detail can we put to the screen? I'm quite a bit happier with the latter.