Another rendering update

Post by **fr0stbyte124** » Sat Dec 29, 2012 5:04 am

I'll make a proper development log before too long, but for now I'll just post some progress here.

I've been revisiting the render pipeline the last few days, ironing out some details and expanding on what was already worked out.

You may remember that the opaque cube renderer, which is the bread and butter of if the terrain rendering, separates completely geometry data from the texture data so that both can be optimized independently. The surface data is stored in two massive textures (the default size for desktop gpus will be 4096x4096 and total 112MB. If you have a ton of onboard memory, we could take it up to 8192x8192, or 448MB). If you've seen anything about the megatextures in Rage, this is similar; subdivide a giant virtual texture into visible patches and use a secondary lookup texture and custom shaders to locate them in the onboard texture space. In this case, patches are 8x8 and there are 16 of them per layer of each 32^3 super chunk. There can be up to 16 layers, and layers can be merged together or reused if there is no overlap.

The geometry part is cut down to bare-bones, and it doesn't need much, since this pipeline handles opaque, full-sized cubes exclusively. Vertices are stored in 3x8 bits, in coordinates local to the super chunk. Each coordinate only has a range of 0-33, but 8 bits is the smallest a vertex coordinate can go. We also include another 8 bits for padding, which I might use for something later on. Face normals won't work properly on cubes, and I spent a few days trying. Instead, I calculate the face on a per-pixel basis (one of the xyz coordinates in model space will be a whole number, since each pixel is resting on a bIock face).

Model meshing is really complicated and I won't get into it here, but if you are curious, I am using methods quite similar to the ones discussed on 0fps. Take a look if you haven't seen it. Originally, I had wanted to use triangle meshes and minimize the number of vertices. However, after running the numbers our vertices are so small that indexes alone make up a significant part of the mesh definition. To minimize the total size, which is the goal, you have to look at the whole picture. It's a complicated mess, but long story short, the greedy rectangle mesh wins out.

------------------------------
Anyway, all that is mostly review. Now for the new stuff.

I am now separating the surface data and the geometry even farther by making it a two-pass system. The first pass writes geometry to the depth buffer, along with several other buffers which contain the normals and most of the resolved lookup coordinates. The second pass is textures and any dynamic lighting we may add. By doing it this way, we can mix and match any number of geometry models on a per-chunk basis, as well as ensure that only the final tally of visible stuff gets textured. Another advantage of pasting the lookup data to the buffers rather than figuring it out later is that you can work in different spaces. Since this needs to work at the range of kilometers, it may be better to do local operations on a smaller scale to keep floating point precision error under control. Then by storing the final answer to buffers, the second pass can work exclusively in screen-space rather than the potentially gigantic world space.

Mixing different geometry models is very important if we want to get serious draw distance improvements. The traditional geometry mesh above is efficient and fast to draw, and relative to most meshes, is very small. However, it is not a simple task to reduce detail when the surfaces have such well-defined blockiness. To avoid nickle-and-diming the gpu memory to death, we need to pull out some pennies. Most of the time, natural terrain (minus trees) is smooth and generally doesn't exhibit concavities on the sides. In cases like these, we can take advantage of the simpler structure and describe it with a heightmap. How it works is you take a 32x32 texture, (part of a larger one like the surface data), and assign it to a superchunk. Then you model a cube around the chunk and use that surface as starting points for raytracing through the chunk. Each time a ray passes through a new gridpoint, it checks the heightmap to see if it is at or below the top level for that xz coordinate, and returns the camera distance and normal face for that cube if it is.

Actually, it's a bit more sophisticated than that. Since we only need 5 bits for the heightmap and are getting 16 bits for the texture, I've made it into a 2-sided heightmap, with a top and a bottom surface. There are also 6 bits left over, which can be used as a shortcut to designate the top-bIock type (with 63 possible presets). By doing it this way, you could, for example, have the sides textured as solid dirt, and keep the top layer as grass without making a ton of custom surface patches. Also, by making the heightmap 2-sided, you can describe complex formations simply by drawing multiple heightmaps to the same chunk. This technique is similar to one called parallax occlusion mapping, though specialized for our purposes.
Oh, and at 16 bits per xy stack (half of one vertex in the other method) you can pretty much give up on trying to beat it in terms of memory usage.

However, heightmap rendering can be potentially slow to draw, though, if there is a lot space between the starting ray and its final position, or in oblique cases where the ray passes back out the other side of the chunk without encountering anything. This can be mitigated by having a more closely fitting mesh surrounding the heightmap, so that the rays start closer to their destination. Of course, that means going back to storing vertex data again, albeit less detailed. An alternative to that would be to use gpu tessellation to generate the starting mesh directly from the heightmap (still can't go straight to the final product because they don't generate a square pattern), but that functionality is only available on DX11 cards, so a lot of people couldn't use it. Everything is a tradeoff, and there are lots of factors to play with. For instance, I might be able to shrink the heightmaps further by halving their resolution and blending the results. From orbit, a single bIock offset won't even be visible. It might even be worth it to mix heightmaps and traditional models in the same chunk for say, complex buildings. Even Crysis used heightmaps for most of its terrain. Things like cliffs and overhangs were then added as standard meshes. We can do all of that without any trouble thanks to the 2-pass system.

Heightmaps are also nice because they work almost exclusively in screen-space. Traditional model draw times scale with the number of polygons being drawn. Raycasting models scale with the amount of screen-space being used. This is how we will achieve massive detail, limited primarily by gpu memory (which at this point we are sipping) and screen resolution (fastest way to double your fps in a shader-heavy game is to play windowed).

----------------------------------
That's all I have for now. I'll spare you from all the stuff I tried which didn't pan out. Also, I should mention that most of this work is pen and paper, so there's nothing to show, but the theory is all very sound. There's a ridiculous amount to do but things are looking more promising than ever. Aside from my inexplicable migraine, I'm feeling really good about this.

And as always, I have no idea how much of this will make sense to the laymen, so just ask if there is something you didn't get.

Post by **Prototype** » Sat Dec 29, 2012 6:16 am

Finally a sign.

That seems like a good system, from what I can understand, which is very little, but I can see that someone is still working on futurecraft, which is good news

Keep it up

Shadowcatbot · Post by **Shadowcatbot** » Sat Dec 29, 2012 6:00 pm

Uhm, Yeah, Keep up the good work fr0st!.....He isnt buildinga nuke or something right? I understood like 1/5 of that...

ACH0225 · Post by **ACH0225** » Sun Dec 30, 2012 10:21 am

I totally understood all the stuff in there! Great work as ever, Fr0st.

Dr. Mackeroth · Post by **Dr. Mackeroth** » Sun Dec 30, 2012 8:57 pm

Post acknowledged. I finally got around to looking in the Development section, pretty stupid of me not to earlier. I don't have the time to read, translate, and then de-code your post now, but when I do, I'll post a proper reply.

hyperlite · Post by **hyperlite** » Sun Dec 30, 2012 9:44 pm

Post by **Prototype** » Mon Dec 31, 2012 5:07 am

Most of that makes some sense to me, but I don't know the correct technical terms to describe it to everyone else.

But it looks convincing

Post by **fr0stbyte124** » Mon Dec 31, 2012 6:43 am

Prototype wrote:Most of that makes some sense to me, but I don't know the correct technical terms to describe it to everyone else.

But it looks convincing

Yes, yes. A fine ruse indeed.

Spoiler:

Anyway, found a cool demo of a similar technique which should help showcase what I am talking about
http://www.stobierski.pl/unity/RSP_demo1/WebPlayer.html (needs unity webplayer).
Heck, here's some more: http://forum.unity3d.com/threads/161412 ... sset-Store

The thing to note from the demo is that all those shapes are in fact completely smooth. All lighting, self shadowing, and depth detail are written onto textures and rendered on the fly in realtime. This is way more efficient storage-wise than storing geometry to get those fine details, and we'll need every bit of storage we can muster. I came up with some presets we could use with the remaining 6 bits in the heightmap, and theoretically we should be able to use it as a drop-in long-distance terrain renderer, even outside of futurecraft. Assuming a single layer, if the preset matches, you could have a mod running on a multiplayer server, dropping in miles of terrain to the client at the price of 0.5 bits per bIock. That's not even taking into consideration external compression. Also keep in mind that this is for superficial detail, so only the top layers of chunks need to be sent. Optimally, we could upload four chunks of terrain in 2kb, or, to put that in perspective, about 1/4 more data than this post, and then run it natively right out of the box. No tessellating, no chunk building.

I'm very excited about this method.

Ivan2006 · Post by **Ivan2006** » Mon Dec 31, 2012 11:39 am

And how far would the view distance in FC be with the same computing power required for vanilla- distance far?
Just so someone gets an idea of what it all is about.

blockman42 · Post by **blockman42** » Mon Dec 31, 2012 6:48 pm

Progress? Fr0st must be trolling us

Post by **fr0stbyte124** » Mon Dec 31, 2012 11:10 pm

Ivan2006 wrote:And how far would the view distance in FC be with the same computing power required for vanilla- distance far?
Just so someone gets an idea of what it all is about.

It's not apples to apples, if that is what you are asking. Performance will greatly be dictated by the hardware you are using, though hopefully even low-end hardware should outperform vanilla minecraft.

Theoretically, with sufficient memory and some lucky breaks with the terrain layout, we could go until we start running into floating point precision errors. The vanilla graphics pipeline is tied up in two places: memory and playback, and chunk loading. Chunk loading doesn't hurt the fps too much because it is throttled pretty heavily, but makes a difference with how quickly you can move around and what sort of bandwidth you are consuming. Playback is what happens every frame when chunks are drawn one by one to the screen. How efficiently this is handled will affect the upward bound of performance as we increase the number of chunks. Rule of thumb is that the less you need to change between each draw call, and the more you can put on each call, the better performance you can get. One thing we'll be able to do is keep track of which parts of what meshes are going to be visible and only display those, to keep the number of draw calls down. My old occlusion culling algorithm only needed to update when the camera crossed chunk boundaries, and the new system should be even more efficient. So that helps.
But memory will be our primary focus. If you can't keep the geometry in dedicated RAM, the constant transfers will drag down the entire pipeline. And as we can't easily hide unused details in a blocky environment, that's what the focus needs to be on, and what the heightmaps will most closely address.

That's a bit of a non-answer, isn't it? I'll say this though. Load times will be much, much faster, and draw distance will have a much smaller impact on the framerate, though the initial cost might be somewhat higher than vanilla. We'll just have to wait and see how well it performs once the pipeline is complete. Worst case scenario, a player can switch back to the vanilla pipeline and not have any worse performance, though I honestly believe that won't be necessary.

blockman42 wrote:Progress? Fr0st must be trolling us

Yes, you've found me out. I put great pride into making my trolling look legit, often putting in weeks of research into modern gamedev techniques to get all the details right.
One might ask, "Fr0stbyte, if you spend so long on this just to troll people, why don't you just make the real thing?" But of course, if you have to ask, then you could never truly understand the heart of a troll such as myself.

cats · Post by **cats** » Tue Jan 01, 2013 12:20 am

fr0stbyte124 wrote:But of course, if you have to ask, then you could never truly understand the heart of a troll such as myself.

That's sig-able.

blockman42 · Post by **blockman42** » Tue Jan 01, 2013 11:26 am

catsonmeth wrote:
fr0stbyte124 wrote:But of course, if you have to ask, then you could never truly understand the heart of a troll such as myself.
That's sig-able.

Everything he says turns into a sig

hyperlite · Post by **hyperlite** » Tue Jan 01, 2013 11:35 am

One day he is gonna break it to us that all this is false, and I will laugh.

Post by **fr0stbyte124** » Wed Jan 02, 2013 11:07 am

I want to elaborate a little on how heightmaps are turned into 3D coordinates.

Displacement mapping is a technique by in which a flat polygon's texture coordinates are shifted to present the illusion of depth, which is accurate along the camera's line-of-sight. The point is store a potentially large amount of surface detail without using many polygons. If you look at a stone wall on a 3D game and the stones appear to be popping out of the wall, that is very likely displacement mapping.

There are several varieties to pick from depending on what sort of material you are using, what angles the surface will be viewable at, what level of detail you need to be accurate to, and how much processing power you can affort to spend on the shader. The variety we are most interested in the called parallax occlusion mapping, or POM. You can consider it the full version of the displacement mapping family, capable not only of texture displacement, but also self-occlusion (i.e. parts of the surface can overlap) and self-shadowing.

The way POM accomplishes this is by ray-tracing. There are variations on techniques but they all work with rays starting at the surface of the object's outer mesh hull. Each ray travels down the line away from the camera a calculated distance and checks to see if that point's vertical position is above or below the heightmap. Depending on the result, the ray may shift around some more before it's satisfied it has the right spot and returns the final texture coordinates. Most of the research into this rendering technique is in regards to how far each step should be. The number of steps taken is cost of the shader for that ray, and in general the slowest ray becomes the cost of every ray in the batch. That's why it is important to keep the number of steps down as low as possible.

A good way to keep the number of steps down is to make the containing mesh rest as closely to the virtual surface as possible while still fully containing it (rays cannot originate from outside the mesh). But of course, then there is a trade-off between the level of detail in the mesh and how many polygons you are going to spend on the venture (remember, the point of POM is to reduce the polygon budget). The absolute worst case scenario is if a ray passes all the way through the mesh without ever intersecting the virtual surface. It's so expensive, in fact, that there are special-case shaders specifically designed for handling silhuettes.

-------------

So that's the starting gate. We have a heightmap (a pair, actually), and we have a mesh. The simplest mesh we could use is a bounding box surrounding the entire chunk, but that will lead to lots of worst-case events when viewed from the side. We could shrink the bounding box to the outer dimensions and get a little improvement for free. We could add more vertices and make odd-shaped meshes which get rays much closer to a good starting point, but then we are back to relyng on vertices, albeit in a much diminished capacity. If we do that, we can also mark the max traversial distance for each ray by rendering the depth-map of the back-facing triangles first and then stopping when the rays pass that point, but then we've added more complexity to the pipeline. Then there are combinations of the above, like starting with an odd-shaped mesh and letting rays travel to the edge of the bounding box and take that performance hit. With DirectX 11 style tessellation (OpenGL 4.0), we can procedurally generate the surface mesh from the heightmap pretty cheaply, and that would save space in video memory, though at the cost of extending the pipeline yet again. Also, it would be limited to high-end cards.

Now that's just the surface mesh. The other side of the shader, as you'll remember, is the step size. In Minecraft, we've got things good, for once. The chunk volume is evenly divided, and each volume is either completely occupied or completely empty (remember, this pipeline caters exclusively to the opaque cube blocks). Figuring out the step size from one block to another is simple: you take the distance from your current point to the next X, Y, and Z boundary along your ray (very basic vector math), and then take the minimum of the 3 to be your next step distance. You know which block you enter, which face you are entering from, and where you are on that face. If the block is occupied, you're done. If empty, get your next step distance and continue.

That's the basic rules, and it's rather easy to implement, but depending on how accurate the mesh is, it could still end up with some pretty long traversial times. So here we introduce another raytracing technique, this one is adopted from fully-fledge voxel renderers: distance mapping. In a voxel structure, you can potentially spend a really long time traversing the data structure before you find anything interesting. Distance mapping can speed that up, by telling you how far you can safely travel without hitting anything. How it work is that each unoccupied space on the grid has a value which say how many adjacent spaces you can travel in any direction before you hit anything. In spaces next to the surface, it's not very helpful, but out in the open, you can rack up some major speed.

But we're not set up to store a 3D voxel structure, so what can we do with this? Well, what we do have is the vertical distance to the surface for each column. If we could ensure some guaranteed horizontal distance, too, we can use the speedup. As it happens, we can do just that. Enter, mipmaps. Mipmaps are essentially smaller copies of a texture. As something becomes more distant, you can use these smaller textures to efficiently sample proportionally larger sections of the original texture with better caching and fewer artifacts. If you are using a 32x32 hightmap, you would also include a 16x16, an 8x8, a 4x4, a 2x2 and a 1x1 texture, which together is ~1/3 more memory. A little expensive, but nothing compared to extra vertices. Once these mipmaps are in place, you can sample them like normal, only instead of traversing a 32^3 grid, you are traversing a 16^3 or an 8^3 grid with guaranteed empty spaces. Not only does this let you use fewer steps, but each step may be cheaper as the gpu can take full advantage of cached texture data.

-------------

So in summary, what we're looking at is probably having each convex-hull shape having its own heightmap (with mipmaps) and a minimum-spanning bounding box. Each vertex of the bounding box has directions to the shader for finding the correct heightmap, as well as a recomendation for which mipmap octave to start in. If that's not cutting it, or we think we can get away with it, we start adding vertices to the mesh to get the starting points closer, and let rays bail out once they reach the opposite side of the bounding box (i.e. no external knowledge needed).

The times we are spending a lot on texture access are the times when there are a lot of chunks on screen, which also happens to be when you are looking down from high altitude. Because of this, there aren't many silhouette cases, so worst cases should be at a minimum, particularly if using odd-shaped meshes. Additionally, it is worth noting that we don't need to pick all bounding boxes or all odd-meshes. OpenGL doesn't have any context of the shape beyond the vertex, so we can pick and choose as-needed.

-------------
*edit*
Oh, that wasn't little at all, was it...

FC Forums

Another rendering update

Re: Another rendering update

Re: Another rendering update

Re: Another rendering update

Re: Another rendering update

Re: Another rendering update

Re: Another rendering update

Re: Another rendering update

Re: Another rendering update

Re: Another rendering update

Re: Another rendering update

Re: Another rendering update

Re: Another rendering update

Re: Another rendering update

Re: Another rendering update