Texture compression and really really distant things.

Post by **fr0stbyte124** » Thu Jan 10, 2013 1:05 am

Time for another update, and this one is both kind of interesting and strange, so I've decided to make a new thread to discuss it.
The subject today is texture compression. Specifically, how we can abuse it for heightmaps.

As you may recall from my other thread on the subject, the render pipeline is getting pretty full of various drawing methods, each with its own strengths and weaknesses. At the lowest leve, we have polygon drawing like normal, which starts out as blank and gets textured in a deferred process along with lighting. Next we have two-sided heightmaps. These guys can represent voxel data as it shows up most of the time in the natural world--a low complexity blob of connected blocks. You can stack them together within a chunk to reproduce virtually any geometry, though most of the time we would only be dealing with one or two per chunk. Finally, we have our conventional heightmaps, which represent the world by gross elevation alone and can't handle the concavities occasionally present. What it can do, however, is scale. Each patch of this large-scale heightmap is the same resolution, but the area they cover can change, all the way from direct block representation, all the way up to a single patch covering a whole face of the planet. And then as more data about the planet streams in, more and more detail can be added. Not only is this good for streaming, but also scaling the scene with the capabilities of the host GPU which may or may not have much memory to work with. In theory, this system should be able to provide however much detail your machine can comfortably manage without going over.

Now, how these heightmaps work, as I've described before, is that you start with a ray at each pixel in a bounding box around the heightmap. From each ray, you check the current height against the height marked in the heightmap texture and moving across the field until you cross the height field. Then each ray returns where in the scene it it, and which block face that is. Because this is done on a per pixel basis, the amount of terrain on the screen doesn't affect the performance, which is why raycasters are quite handly. But this is where things get interesting. The horizontal resolution is dependent on the scale of the texture to real-world coordinates, and will always be in powers of two. The height, on the other hand doesn't have to be tied to anything. Since rays are either above or below the height, it doesn't matter if the increments are to scale, or even if they are whole numbers; the renderer will still produce correct looking block columns for whatever that height happens to be.

This opens some new doors. For starters, we don't need a 16-bit value to perfectly track all the heights. Instead we could put each patch in a bounding box spaning the highest and lowest elevations within that patch. How much height difference would you normally expect between a couple of chunks? 20m? 40m? Even more importantly, natural terrain tends to be smooth once you get far enough out. This is important, because it means we can store the heightmaps to a DXTC texture format.

--------

So what's a DXTC format, you ask? It's a type of lossy compressed image, like jpeg, only this one can be stored and read natively on pretty much every graphics card in existence.
Here's an example of DXT1, borrowed from the Wolfire blog.

Takes you back to the 16-bit era of gaming, just a bit, but overall, almost the same image.

Now here's the raw image reduced to 128 KB via resolution reduction alone.

So DXT1 is a rather good way to shrink your video memory requirements. In this case, it had 6x compression. And, with our drawing power limited to the amount of texture we can store, that is a very good thing.
DXT1 works on 4x4 pixel blocks. Each block has two colors, C0 and C1. The rest of the block is represented by 2 bits per pixel, for a total of 4 modes each. In one setting, mode 0 means the pixel is color C0. Mode 1 = C1. Mode 3 = halfway between C0 and C1, and mode 4 = pure black. By reversing the order of C1 and C0, you could have an entirely different set of values, in this case, mode 3 and 4 are at 1/3 and 2/3 between C0 and C1. The order chosen is whatever is most accurate for that block, and as you can see, it works pretty well.

But we're not going after color. We're here for heightmaps. Specifically, we are looking for at least 8 bits of data (256 increments) to be compressed. What we need is DXT5.
DXT5 has two sections, the first is the color section, nearly identical to DXT1. The second is the alpha section. Again, there are two bounds, A0 and A1, which are both 8-bit, only this time the block maps 3 bits per pixel, for a total of 8 modes. In the first setting, based on the order of A0 and A1, modes 3-8 are at evenly distributed points between A0 and A1. With the reverse setting, modes 3-6 are even increments (slightly larger ones, as there are fewer points). Modes 7 and 8 are 0 and 255, regardless of what A0 and A1 are bounded on.

As the elevation in natural terrain is mostly smooth, especially as you get further away, this alpha is a good fit for height bounds. The less change in height of each 4x4 patch, the more accurate the heightmap becomes. We'll also include two full-sized floating point bounds, which set the world height for 0 and 255, and can be wherever is most optimal and produces the fewest cracks between heightmap chunks. This may sound like a lot of set-up, and there is a bit to do, but the great thing about this is that there's no rush. It can be done at any point on the server and then get downloaded to clients as-needed. The color channel can be used to color in each column with the dominant visible block color. If we want to have a secondary color for the top block, like trees or grass, we need to apply a second DXT1 texture.

-----------

So that is what I want to use with raycasting on the heightmap model. Now, color+height together is a mere 8 bits/pixel and covers the entire column for that side of the planet. Plus, we can still scale with resolution. But here we run into a problem. Before I mentioned a method of using lower resolution versions of the heightmap to declare safe regions for ray traversial where it is guaranteed not to hit anything. It's called an acceleration structure and it's the secret to raytracing over large heightmaps. Unfortunately, that no longer works if you add DXT5, because the values have error in them, and the modifiers are all different. In the 2-sided heightmaps, I was assuming that if I could get the entire heightmap patch loaded into each multiprocessor's texture cache, the access time for each pixel is so fast that we don't even need to bother with an acceleration structure. With these full-scale heightmaps, that might not be the case. These texture patches are going to be larger than 32x32, and have full-sized bounding boxes, so an acceleration structure is much more important.

The average texture cache is going to be 4-8kb, so we should assume 4kb with whatever we do. DXT5 puts 4x4 pixels in 128 bits, so a 32x32 texture patch would take up 1024 bytes, or 1536 bytes if you are counting the secondary color, which we probably should. 64x64 would take us to 4kb without the secondary color. Until the triangle is drawn, nothing will be using that shader unit's resources except for other parts of the same triangle. What makes this interesting is that most of the time, a single triangle isn't going to explore the entire heightmap, so we have a little bit of wiggle room. With any luck, we can probably go a bit over the texture cache limit and stick with 64x64 pixel textures and the secondary color. The next level up though, would blow out even an 8kb texture cache, so we'll probably stick to this. For a 5km planet, that's still ~6400 texture patches per face, so we're almost there. But add the fact that we don't need to display anywhere near 1m resolution for the entire planet when we're out in orbit, and we finally have something which can take us all the way to space!

I'm not very happy with the loss of the acceleration structure, especially when the bounding box can have a ton of empty space. But the decreased resolution maps should help keep the total raycasting time down. With hardware tessellation, DX11 cards can drop rays right in front of the spot on the correct heightmap, at which point the rays are just there to turn the mesh into blocks, so that's good, but it's not going to help anyone with average gpus, who unfortunately are the ones who need it most. We could make a second mesh which generates a closer starting point for each ray, but we need to be careful not to put down too much detail or we could still swamp a weak gpu. Also, far enough away, and we can remove the raycasting altogether and make the planet look flat. Then it's almost nothing to render. By generating a normal map from the heightmap, we can use deferred lighting to give it an indication of depth, though without the self-occlusion.

--------

Anyway, just thought I would report this. Until now, the question has been how are we going to see planets from space? Now the question is how much detail can we put to the screen? I'm quite a bit happier with the latter.

Post by **fr0stbyte124** » Thu Jan 10, 2013 2:51 pm

Just out of curiosity, is anyone here actually understanding any of this, or is it all tl;dr-keep-up-the-good-work-fr0stbyte?

Contrary to popular belief, I don't write these because I am enamored with my own briliance. Well not just because of it.
If nobody is finding this kind of discussion informative or helpful, I'll just stick to progress updates.

Post by **Prototype** » Thu Jan 10, 2013 3:07 pm

I'm reading it, and I understand a decent amount of it, and so far it's pretty interesting, even if some of the technical terms are nonsense to me, I still see what it's getting at.

It is helpful, while I can't do any programming on this level, I'm having to take a crash course in computer science/coding/java/throwing bricks at my computer/cat/window in order be able to do more stuff for the mod, and while I'm not able to actually do anything awesome at the moment, these updates are pretty helpful for knowing what everything is going to work with (maybe)

Also it tells people that stuff is actually getting done, and this isn't just some convincing trolling.

Or is it?

Post by **fr0stbyte124** » Thu Jan 10, 2013 7:23 pm

Seeing as how this is requiring me to do a heavy amount of research, if this were a troll, the one being trolled would probably be me.

Anyway, it occurs to me today that these scalable heightmaps aren't going to be able to use the virtual texture system, which includes light levels. Since this is a heightmap, every top surface can be assumed to have full sky light. However, there's no easy way out of getting block light, such as torches and lava. I played with the idea of using dynamic point lights to illuminate the terrain, but since we can't constrain their density, we could easily run into problems. The only thing we can do is bake it into the texture. So that secondary color texture which was a DX1 now becomes a DX5 like the heightmap texture, and the data density goes from 4 bits/pixel back up to 8 bits/pixel.

You'll notice that light takes up 4 bits per pixel. Why, that's how much space block light normally takes up! Wouldnt it be great if there was some way we could use that directly and not fuss with this interpolation nonsense? As it happens, there is. Instead of using DXT5 compression, we can use DXT3. DXT3 is nearly the same as DXT5, except with how its alpha is calculated. Rather than using a high and a low value and interpolating with 3 bits per pixel, you use all 4 bits per pixel and have no range values. Instead your range is 0-255 and you have 16 increments. In otherwords, we can supply a literal definition of the block light value for every column.

So that should be accurate for the top face. So what do we do about the sides? They have no light values, so we need to guess. The easiest way would be to subtract the manhattan distance (correctly scaled) from each of the 2x2 nearest heightmap columns, and take the max of those values. This value should be roughly the correct smooth shaded color, assuming the block lights originate from the floors and not the walls. Otherwise this will be backwards. However, without getting closer, you would not be able to tell whether the light source location was guessed correctly or not, so this estimation should suffice as an indicator of where lights are present.

However, this also means that the texture fetches become 8kb for a 64x64 patch of land, which is a problem. I might still try it with 64x64 just to see what that's like, but more than likely I'll need to scale it back to 32x32 patches. Texture storage-wise, there's no difference, but it will mean 4x as many bounding boxes to present the same the same level of detail. About the only thing this accomplishes is making the top and bottom of the boundng boxes just a little bit more accurate. On the plus side, this mean that the entire texture will fit comfortably inside even the smallest texture cache and only take up 2kb of space, which is great.

Putting it all together, lets assume a pair of 4096x4096 textures worth of these kind of heightmaps. Together, that is 16,384 available patches with a total memory footprint of 32 MB. Even if a system had a mere 128MB of video memory, which is as low as you will ever see, that is 25% of the video memory. A full pool of bounding boxes, assuming no geometry shaders or tessellation, would take up another 16MB, mostly because of byte padding and needing to include enough data to reconstruct the bounding box from any vertex (important for raycasting). Using a texture lookup, a geometry shader, or tessellation would reduce that requirement significantly.

2-sided heightmaps will take up roughly the same amount of texture space, but can't scale.
Both the 2-sided heightmaps and the regular polygon meshes will need the virtual texure lookup, which is going to be more expensive. On top of that, there are several screen buffers which need to be stored to accumulate information. Just estimating, but 256MB of video memory is probably the smallest this can go without making some serious cutbacks, though again, there's no way to know how much is needed to look good until we can benchmark.

Post by **fr0stbyte124** » Thu Jan 10, 2013 8:47 pm

Oh yeah, I forgot that textures needs to have dimensions which are powers of 2, but the don't have to be the same powers of 2. So if we wanted to save space in on of the 4096x4096 textures, we can cut them in half, rather than in fourths. This give us a lot more wiggle room for adjusting texture sizes than I thought. Also since these DXTC textures seem to work so well, I might want to re-evaluate the other virtual textures as well.

*Edit*
Just found another intesting compression format,
EXT_texture_compression_latc on NVIDIA and
3Dc on AMD/ATI.

Both are the same thing. Basically, it takes the block DXT5 uses for alpha, which you'll recall is really good for gradients, and gives you two of them and no RGB section. This is particularly good for storing two components for a normal map (you derive the third). But more than that, it's just generally good for getting anything you want stored as scalars and don't need that color baggage. There's also a version which only uses one block (so 4 bits/pixel).

hyperlite · Post by **hyperlite** » Fri Jan 11, 2013 2:18 pm

I looked at the pictures

Spoiler:

Iv121 · Post by **Iv121** » Fri Jan 11, 2013 3:05 pm

You mean this specific one instead of reading all that frightening stuff Frost wrote ?

Chairman_Tiel · Post by **Chairman_Tiel** » Fri Jan 11, 2013 3:09 pm

Iv121 wrote:You mean this specific one instead of reading all that frightening stuff Frost wrote ?

Only frightening to the uneducated, if you'll excuse my bluntness. Even having a rudimentary knowledge of how things work in the realm of video game engines makes it easy to decipher what he's saying. You were a modmaker for Skyrim, right?

Iv121 · Post by **Iv121** » Sat Jan 12, 2013 2:35 am

I am indeed, and I know what he talks about, but I still write freighting for the gags, no need to take everything seriously

.

Post by **fr0stbyte124** » Mon Jan 14, 2013 1:36 pm

Turns out I was wrong about not being able to store mipmaps. OpenGL allows it, the same as any other texture. However, the compression will make them much more difficult to generate. Instead of working on direct 4x4 regions, now your textures represent 8x8 and 16x16 regions. It won't be possible to have perfect height matches between octaves, so the rule here is that each point must be no lower than any column in the area it represents. But beyond that, you still want as close a fit as possible. The existing DXT compression libraries are all built around minimizing error, but our needs require a wholy different kind of bound, and that means redoing some rather complicated and highly optimized code which I don't really understand to make this work.

So now I'm wondering whether it is worth it to get the acceleration structure. It would reduce the general number of raymarching steps quite a bit, but the worst-case might actually be even worse in terms of texture lookups. Add to that the additional logic of switching octaves and the risk of branching, and you start ending up with a lot of cons to this method. There's going to be a ton of raymarching though, and it's going to affect the amount of long distance geometry we can display more than any other single thing, so we need to be as efficient as possible on this step.

Tunnelthunder · Post by **Tunnelthunder** » Mon Jan 14, 2013 5:42 pm

Well, I have little knowledge on this and can only say, "If it looks bad on paper it will look worse in action." -no idea who said that. So, if you don't think the benefits are high enough just don't go for it.

FC Forums

Texture compression and really really distant things.

Re: Texture compression and really really distant things.

Re: Texture compression and really really distant things.

Re: Texture compression and really really distant things.

Re: Texture compression and really really distant things.

Re: Texture compression and really really distant things.

Re: Texture compression and really really distant things.

Re: Texture compression and really really distant things.

Re: Texture compression and really really distant things.

Re: Texture compression and really really distant things.

Re: Texture compression and really really distant things.