Another rendering update
Posted: Sat Dec 29, 2012 5:04 am
I'll make a proper development log before too long, but for now I'll just post some progress here.
I've been revisiting the render pipeline the last few days, ironing out some details and expanding on what was already worked out.
You may remember that the opaque cube renderer, which is the bread and butter of if the terrain rendering, separates completely geometry data from the texture data so that both can be optimized independently. The surface data is stored in two massive textures (the default size for desktop gpus will be 4096x4096 and total 112MB. If you have a ton of onboard memory, we could take it up to 8192x8192, or 448MB). If you've seen anything about the megatextures in Rage, this is similar; subdivide a giant virtual texture into visible patches and use a secondary lookup texture and custom shaders to locate them in the onboard texture space. In this case, patches are 8x8 and there are 16 of them per layer of each 32^3 super chunk. There can be up to 16 layers, and layers can be merged together or reused if there is no overlap.
The geometry part is cut down to bare-bones, and it doesn't need much, since this pipeline handles opaque, full-sized cubes exclusively. Vertices are stored in 3x8 bits, in coordinates local to the super chunk. Each coordinate only has a range of 0-33, but 8 bits is the smallest a vertex coordinate can go. We also include another 8 bits for padding, which I might use for something later on. Face normals won't work properly on cubes, and I spent a few days trying. Instead, I calculate the face on a per-pixel basis (one of the xyz coordinates in model space will be a whole number, since each pixel is resting on a bIock face).
Model meshing is really complicated and I won't get into it here, but if you are curious, I am using methods quite similar to the ones discussed on 0fps. Take a look if you haven't seen it. Originally, I had wanted to use triangle meshes and minimize the number of vertices. However, after running the numbers our vertices are so small that indexes alone make up a significant part of the mesh definition. To minimize the total size, which is the goal, you have to look at the whole picture. It's a complicated mess, but long story short, the greedy rectangle mesh wins out.
------------------------------
Anyway, all that is mostly review. Now for the new stuff.
I am now separating the surface data and the geometry even farther by making it a two-pass system. The first pass writes geometry to the depth buffer, along with several other buffers which contain the normals and most of the resolved lookup coordinates. The second pass is textures and any dynamic lighting we may add. By doing it this way, we can mix and match any number of geometry models on a per-chunk basis, as well as ensure that only the final tally of visible stuff gets textured. Another advantage of pasting the lookup data to the buffers rather than figuring it out later is that you can work in different spaces. Since this needs to work at the range of kilometers, it may be better to do local operations on a smaller scale to keep floating point precision error under control. Then by storing the final answer to buffers, the second pass can work exclusively in screen-space rather than the potentially gigantic world space.
Mixing different geometry models is very important if we want to get serious draw distance improvements. The traditional geometry mesh above is efficient and fast to draw, and relative to most meshes, is very small. However, it is not a simple task to reduce detail when the surfaces have such well-defined blockiness. To avoid nickle-and-diming the gpu memory to death, we need to pull out some pennies. Most of the time, natural terrain (minus trees) is smooth and generally doesn't exhibit concavities on the sides. In cases like these, we can take advantage of the simpler structure and describe it with a heightmap. How it works is you take a 32x32 texture, (part of a larger one like the surface data), and assign it to a superchunk. Then you model a cube around the chunk and use that surface as starting points for raytracing through the chunk. Each time a ray passes through a new gridpoint, it checks the heightmap to see if it is at or below the top level for that xz coordinate, and returns the camera distance and normal face for that cube if it is.
Actually, it's a bit more sophisticated than that. Since we only need 5 bits for the heightmap and are getting 16 bits for the texture, I've made it into a 2-sided heightmap, with a top and a bottom surface. There are also 6 bits left over, which can be used as a shortcut to designate the top-bIock type (with 63 possible presets). By doing it this way, you could, for example, have the sides textured as solid dirt, and keep the top layer as grass without making a ton of custom surface patches. Also, by making the heightmap 2-sided, you can describe complex formations simply by drawing multiple heightmaps to the same chunk. This technique is similar to one called parallax occlusion mapping, though specialized for our purposes.
Oh, and at 16 bits per xy stack (half of one vertex in the other method) you can pretty much give up on trying to beat it in terms of memory usage.
However, heightmap rendering can be potentially slow to draw, though, if there is a lot space between the starting ray and its final position, or in oblique cases where the ray passes back out the other side of the chunk without encountering anything. This can be mitigated by having a more closely fitting mesh surrounding the heightmap, so that the rays start closer to their destination. Of course, that means going back to storing vertex data again, albeit less detailed. An alternative to that would be to use gpu tessellation to generate the starting mesh directly from the heightmap (still can't go straight to the final product because they don't generate a square pattern), but that functionality is only available on DX11 cards, so a lot of people couldn't use it. Everything is a tradeoff, and there are lots of factors to play with. For instance, I might be able to shrink the heightmaps further by halving their resolution and blending the results. From orbit, a single bIock offset won't even be visible. It might even be worth it to mix heightmaps and traditional models in the same chunk for say, complex buildings. Even Crysis used heightmaps for most of its terrain. Things like cliffs and overhangs were then added as standard meshes. We can do all of that without any trouble thanks to the 2-pass system.
Heightmaps are also nice because they work almost exclusively in screen-space. Traditional model draw times scale with the number of polygons being drawn. Raycasting models scale with the amount of screen-space being used. This is how we will achieve massive detail, limited primarily by gpu memory (which at this point we are sipping) and screen resolution (fastest way to double your fps in a shader-heavy game is to play windowed).
----------------------------------
That's all I have for now. I'll spare you from all the stuff I tried which didn't pan out. Also, I should mention that most of this work is pen and paper, so there's nothing to show, but the theory is all very sound. There's a ridiculous amount to do but things are looking more promising than ever. Aside from my inexplicable migraine, I'm feeling really good about this.
And as always, I have no idea how much of this will make sense to the laymen, so just ask if there is something you didn't get.
I've been revisiting the render pipeline the last few days, ironing out some details and expanding on what was already worked out.
You may remember that the opaque cube renderer, which is the bread and butter of if the terrain rendering, separates completely geometry data from the texture data so that both can be optimized independently. The surface data is stored in two massive textures (the default size for desktop gpus will be 4096x4096 and total 112MB. If you have a ton of onboard memory, we could take it up to 8192x8192, or 448MB). If you've seen anything about the megatextures in Rage, this is similar; subdivide a giant virtual texture into visible patches and use a secondary lookup texture and custom shaders to locate them in the onboard texture space. In this case, patches are 8x8 and there are 16 of them per layer of each 32^3 super chunk. There can be up to 16 layers, and layers can be merged together or reused if there is no overlap.
The geometry part is cut down to bare-bones, and it doesn't need much, since this pipeline handles opaque, full-sized cubes exclusively. Vertices are stored in 3x8 bits, in coordinates local to the super chunk. Each coordinate only has a range of 0-33, but 8 bits is the smallest a vertex coordinate can go. We also include another 8 bits for padding, which I might use for something later on. Face normals won't work properly on cubes, and I spent a few days trying. Instead, I calculate the face on a per-pixel basis (one of the xyz coordinates in model space will be a whole number, since each pixel is resting on a bIock face).
Model meshing is really complicated and I won't get into it here, but if you are curious, I am using methods quite similar to the ones discussed on 0fps. Take a look if you haven't seen it. Originally, I had wanted to use triangle meshes and minimize the number of vertices. However, after running the numbers our vertices are so small that indexes alone make up a significant part of the mesh definition. To minimize the total size, which is the goal, you have to look at the whole picture. It's a complicated mess, but long story short, the greedy rectangle mesh wins out.
------------------------------
Anyway, all that is mostly review. Now for the new stuff.
I am now separating the surface data and the geometry even farther by making it a two-pass system. The first pass writes geometry to the depth buffer, along with several other buffers which contain the normals and most of the resolved lookup coordinates. The second pass is textures and any dynamic lighting we may add. By doing it this way, we can mix and match any number of geometry models on a per-chunk basis, as well as ensure that only the final tally of visible stuff gets textured. Another advantage of pasting the lookup data to the buffers rather than figuring it out later is that you can work in different spaces. Since this needs to work at the range of kilometers, it may be better to do local operations on a smaller scale to keep floating point precision error under control. Then by storing the final answer to buffers, the second pass can work exclusively in screen-space rather than the potentially gigantic world space.
Mixing different geometry models is very important if we want to get serious draw distance improvements. The traditional geometry mesh above is efficient and fast to draw, and relative to most meshes, is very small. However, it is not a simple task to reduce detail when the surfaces have such well-defined blockiness. To avoid nickle-and-diming the gpu memory to death, we need to pull out some pennies. Most of the time, natural terrain (minus trees) is smooth and generally doesn't exhibit concavities on the sides. In cases like these, we can take advantage of the simpler structure and describe it with a heightmap. How it works is you take a 32x32 texture, (part of a larger one like the surface data), and assign it to a superchunk. Then you model a cube around the chunk and use that surface as starting points for raytracing through the chunk. Each time a ray passes through a new gridpoint, it checks the heightmap to see if it is at or below the top level for that xz coordinate, and returns the camera distance and normal face for that cube if it is.
Actually, it's a bit more sophisticated than that. Since we only need 5 bits for the heightmap and are getting 16 bits for the texture, I've made it into a 2-sided heightmap, with a top and a bottom surface. There are also 6 bits left over, which can be used as a shortcut to designate the top-bIock type (with 63 possible presets). By doing it this way, you could, for example, have the sides textured as solid dirt, and keep the top layer as grass without making a ton of custom surface patches. Also, by making the heightmap 2-sided, you can describe complex formations simply by drawing multiple heightmaps to the same chunk. This technique is similar to one called parallax occlusion mapping, though specialized for our purposes.
Oh, and at 16 bits per xy stack (half of one vertex in the other method) you can pretty much give up on trying to beat it in terms of memory usage.
However, heightmap rendering can be potentially slow to draw, though, if there is a lot space between the starting ray and its final position, or in oblique cases where the ray passes back out the other side of the chunk without encountering anything. This can be mitigated by having a more closely fitting mesh surrounding the heightmap, so that the rays start closer to their destination. Of course, that means going back to storing vertex data again, albeit less detailed. An alternative to that would be to use gpu tessellation to generate the starting mesh directly from the heightmap (still can't go straight to the final product because they don't generate a square pattern), but that functionality is only available on DX11 cards, so a lot of people couldn't use it. Everything is a tradeoff, and there are lots of factors to play with. For instance, I might be able to shrink the heightmaps further by halving their resolution and blending the results. From orbit, a single bIock offset won't even be visible. It might even be worth it to mix heightmaps and traditional models in the same chunk for say, complex buildings. Even Crysis used heightmaps for most of its terrain. Things like cliffs and overhangs were then added as standard meshes. We can do all of that without any trouble thanks to the 2-pass system.
Heightmaps are also nice because they work almost exclusively in screen-space. Traditional model draw times scale with the number of polygons being drawn. Raycasting models scale with the amount of screen-space being used. This is how we will achieve massive detail, limited primarily by gpu memory (which at this point we are sipping) and screen resolution (fastest way to double your fps in a shader-heavy game is to play windowed).
----------------------------------
That's all I have for now. I'll spare you from all the stuff I tried which didn't pan out. Also, I should mention that most of this work is pen and paper, so there's nothing to show, but the theory is all very sound. There's a ridiculous amount to do but things are looking more promising than ever. Aside from my inexplicable migraine, I'm feeling really good about this.
And as always, I have no idea how much of this will make sense to the laymen, so just ask if there is something you didn't get.