Another rendering update

Anything concerning the ongoing creation of Futurecraft.
User avatar
fr0stbyte124
Developer
Posts:727
Joined:Fri Dec 07, 2012 3:39 am
Affiliation:Aye-Aye
Re: Another rendering update

Post by fr0stbyte124 » Thu Jan 10, 2013 2:44 pm

Quick follow up on entities. This is an interesting item, because of how the harware is designed. Most of the time you have a bunch of identical objects on the screen, and they all have different positions which move around, you want to use something called instancing, if it's available. What this does is it lets you store the model data once, and then has per-instance data stored in another buffer. Typically it's a custom transformation matrix which moves the model into the correct world coordinates, but can also be more advanced stuff, like entire skeleton poses. Once that is stored, you make a single draw call to generate X instances.

The old school way to do this is store matrix transformations to a dedicated matrix stack and call a display list to draw the model each time. The problems are twofold. First, I'm pretty sure editing the transformation matrix counts as a state change, which causes the render pipeline to flush (in other words, it finishes drawing everything it was working on before applying the new matrix, because that would affect everything still in the queue. More importantly it won't put anything else on the queue until the matrix is changed, so all that time the pipelines are empty is wasted. Once or twice is acceptable, but a few thousand times and it starts to show). It's worth noting this doesn't always happen when you put matrix change instructions to a compiled display list, because OpenGL can get clever and pre-transform the input coordinates so that the current matrix doesn't need to change.

The second problem is that including too many draw calls can make you hit your batch limit. OpenGL has a limited number of commands it can send to the GPU per second, and it is independent of the PCI bandwidth. Once you hit it, no other optimization can help your fps. How you fix it is you put more stuff into a single "batch" and send it all off at once. For instance, rather than drawing triangles one at a time, you draw a whole array of triangles. Display lists are good for this, too, because they can store all of those commands to the GPU and execute them there. You would think from all of that that display lists are pretty great, and while it's true they are a little faster than vertex buffer objects, there is a reason they were depreciated. The reasoning is kind of complicated and not everyone agrees with it, but the main thing is that display lists were designed for a fixed-function pipeline and modern games all use programmable shaders and have more high-poly models, and the two systems don't fit together quite right. For the sake of consistency, we're going to try to keep the use of display lists to a minimum, even though Minecraft falls more in line with old-school fixed function pipeline games.

So back to instancing. Hardware instancing is nice, but it does have a cost. Specifically, a per-instance cost. In fact, you need about 1K triangles in your instance model to break even. In the case of minecraft, I don't care how fancy your entities get, you're probably not going to go past 20. So that's a waste. Well, if not that, and not display list style, then what do we do? The best answer I've found to this is you do it yourself. Specifically, instead of storing a model once in a vertex buffer, you store it for however many times that guy is going to show up. I know this whole time we've been trying to ration every byte of video memory we can, but this is worth it, and as a matter of fact, the display list style would compile into roughly the same data (since it is trying to cancel out all the matrix changes, it needs independent copies of the geometry.) Instead of storing a per-instance transformation matrix, you just precalculate each vertex into their proper world coordinates. These are low poly models so it's not a really big deal to do this, and even the display lists are doing the vector math on the CPU so there's no difference there.

But this is where it gets fun. Once you have these guys all nicely lined up in your VBO, you can edit them at will. To move one, you recalculate the vertice coordinates for that instance and simply overwrite the section of memory containing it. No rebinding required and only the blocks of memory you change need to be sent to the GPU. Display lists, on the other hand, are read-only. Once you compile it, the only way you are changing anything is if you recompile, which will get too expensive if you have a lot of entities, so performance-wise, the VBO method wins in pretty much every respect. About the only thing display lists do is save on cheap and plentiful CPU memory.

While changing data in-place in a VBO is simple, resizing it is less so. You more or less need to generate a new VBO and transfer everything over to it. So instead, what programmers do is they set the VBO up as a geometry pool. Each instance allocation is the same size, so it doesn't really matter which entity it represents. With that in mind, you can leave sufficient extra slots at the end of your VBO for adding entities later on. When you draw, you simply draw the range of your VBO which is actively in use. If multiple models have the same vertex format, you can even store multiple model types to a single geometry pool, though it gets a bit more complicated to control. I would probably put two types in a pool and start one list at one end, and the other list at the other and work towards the center.

There's also the issue of how to sort entities for rendering only what is in rendering range and on screen. Depending on how you are doing on batch calls, you might just want to draw several ranges from the VBO (aside from the batch call quota, additional draw calls don't hurt the pipeline). If you are running out of batch calls, you may want to either "defrag" your entity list so you can draw them with fewer calls and take a one-time bandwidth hit, or accept that some of the entities drawn will be off-screen and run them through the vertex shader anyway (they won't make past the rasterizer stage, though).

That's entities in a nutshell.
----
There's one other thing I didn't mention, and that's animation. There's two main methods of animation, and they are related. The first is manual matrix transformations. This is normally what you see in minecraft. Things like limbs and spinny things are all rendered as separate pieces, and this is where the nested matrix transformations and rotations become handy, because you can rotate an object in its model space and the place it in world space all relative to its parent object. For the most part, each entity has a dedicated class which programically edits the positions based on whatever animation it is performing and where it is in the sequence. The good thing about this is that the animation is always smooth and precisely placed for whatever point in time it lands on. More complicated animations tend to use positions from pre-made keyframes, which are then interpolated to the correct time by some curve function.

The other way to animate is with a skeleton. A skeleton is simply a tree-like structure of joints and connections which move according to a heiarchy. For instance, if you lift your arm, your hand lifts with it. Instead of doing complicated calcuations on each vector of the model, or store it in pieces, you do the spacial calculations on the handful of control points on the skeleton. Each vertex, then, is painted with a weight cooresponding to how much influence each control point exerts on that particular vertex. This allows the entire mesh to deform. When rendered, you run a custom vertex shader which is given in the model space coordinates of each control pont as constants (or more accurately, their displacement from the model's starting pose. For humanoids, that is typically standing feet together and arms out to their sides.) Each vertex processed is then displaced from its starting point by a weighted average of each influencing control point.

Skeleton animations are good for both static animations (as there is much less data to store) and for dynamic animation, like terrain-accurate walking and ragdoll physics, which is why they are so popular in modern games. However, the drawback is that you are limited by processing power. On the CPU, it is typically things like collision physics, and on the GPU it is the detail of the meshes which need to be deformed. The physics limitation is beginning to be solved by utilizing brand new hardware on the GPU called compute shaders, which can process general non-graphical operations in a super-parallel manner. The vertex shading is getting helped by hardware tessellation. Only the base mesh needs to be run through the vertex shader, and the tesselated vectors are interpolated from those.

There are all sorts of other animation as well. Lighting and texture changes, facial animations, per-vertex mathematical functions (there are a lot of those in the minecraft shader pack), keeping full static meshes of each key frame and switching between them like a cartoon, particle functions, and all manner of specialty techniques, none of which I will go into because they have little to do with Futurecraft and I am already talking too much as it is. But this is the end. I think I've covered everything I wanted to.

Prototype
Developer
Posts:2968
Joined:Fri Dec 07, 2012 1:25 am
Affiliation:NSCD
IGN:Currently:Small_Bear
Location:Yes

Re: Another rendering update

Post by Prototype » Thu Jan 10, 2013 3:00 pm

Animation is something that I want to do, but have no idea how to, but isn't quite important yet (but if we are going to do anything like robot I'll need to learn this), however I've been thinking about trying a skeleton animation, for then, once I had the standard skeleton animations, I could apply it to multiple models right?

Also do you have any examples of the system you intend to use (if there isn't anything exactly like it, is there anything open source which uses similar methods) so that I can start looking at the code, and attempting to work it out, even if it isn't relevant yet I need to start learning this stuff.
Spoiler:
Image
Mistake Not... wrote: This isn't rocket science, *!
Image

Spoiler:
Image

User avatar
fr0stbyte124
Developer
Posts:727
Joined:Fri Dec 07, 2012 3:39 am
Affiliation:Aye-Aye

Re: Another rendering update

Post by fr0stbyte124 » Thu Jan 10, 2013 4:12 pm

Well the turbomodelthingy API has a skeleton class, in which you can create a control point heiarchy and I believe it will handle the cascading transformations for you. You still have to program the animations yourself, though, by rotating the control points, so no importing animations from blender or anything, but it's a good place to start and can run in minecraft without any trouble. To attach a model, you break it up into solid meshes for each moving part, and then attach each to a different bone. I would play with that.

Prototype
Developer
Posts:2968
Joined:Fri Dec 07, 2012 1:25 am
Affiliation:NSCD
IGN:Currently:Small_Bear
Location:Yes

Re: Another rendering update

Post by Prototype » Thu Jan 10, 2013 4:18 pm

fr0stbyte124 wrote:Well the turbomodelthingy API has a skeleton class, in which you can create a control point heiarchy and I believe it will handle the cascading transformations for you. You still have to program the animations yourself, though, by rotating the control points, so no importing animations from blender or anything, but it's a good place to start and can run in minecraft without any trouble. To attach a model, you break it up into solid meshes for each moving part, and then attach each to a different bone. I would play with that.
Looking at that now, I can try messing around with existing mob animations (did something to the cows that would fit in in a cyriak video) and if I remember right there was also a mod which added various mob animations a while ago which was open source, that seems worth a look at for now.
Spoiler:
Image
Mistake Not... wrote: This isn't rocket science, *!
Image

Spoiler:
Image

User avatar
fr0stbyte124
Developer
Posts:727
Joined:Fri Dec 07, 2012 3:39 am
Affiliation:Aye-Aye

Re: Another rendering update

Post by fr0stbyte124 » Wed Jan 16, 2013 3:09 pm

New idea:

Actually, this one is not mine. Saw it [url=http://irrlicht.sourceforge.net/forum/viewtopic.php?f=1&t=44353#p256776]here[/url], while looking up some stuff about the geometry shader. Since I've never really mentioned what that is, I'll do so now.

The geometry shader is a new element to the programmable shader pipeline, resting between the vertex shader (the part that transforms vertices into screen space and can be used for certain mesh animation), and the fragment shader (which calculates the color for every pixel on the screen, or whatever buffer you happen to be writing to). The thing about the vertex shader is that points go in and points come back out. You can't change the number of primitives, nor can you be aware of the neighboring vertices in the primitive. For instance, in my polygon drawing method, I don't keep track of the normal at vertices because it is 3 faces always share the same vertex. Instead, I have to use the fragment shader to work out which plane the pixel is resting on based on its world coordinates and the fact that only one of the coordinates won't have a fractional component (aside from a little precision error). If I used the geometry shader, it would know the position of all 3 vertices and therefore the normal of the triangle without any mess.

The other thing it can do, and this is pretty important, is that it can create and destroy primitives. You could, for instance, turn a single particle into an entire cube (pretty useful for a game like minecraft, eh?)

There are a number of downsides to the geometry shader, though, which is why it is usually overlooked. First, it is a DX10 tier construct, and most games want a DX9 implementation, so they can't rely on it to heavily. Another is that it is kind of underpowered. The geometry shader handles lots of points per thread, so it can't work as massively parallel as its siblings. It's pretty easy for the pipeline to become GS-limited if you overuse it.

------------
So that's the geometry shader in a nutshell. The reason I was looking at it is precisely because it can make cubes pretty well, which we use a lot for chunk bounding boxes. At its most efficient, a cube is represented by 8 vertices and about 14 index pointers defining how they all fit together. Because neither the vertex shader nor the fragment shader are aware of the shape as a whole, every vertex needs to pass on enough data that the fragment shader can recreate the bounding box. This seems pretty wasteful considering it is the same for all 8 vertices. A better solution would be to pass a single point into the geometry shader and let it recreate the other 7 in the bounding box. There are other ways to do this with more textures, but the GS is the most straightforward.

------------

Now for the interesting bit. The post I linked suggested something I hadn't thought of in regards to heigthmap generation. In DX11 you also get hardware tessellation, which allows you to massively increase the amount of geometry on screen by subdividing existing primitives. Each new vertex can be displaced in a new shader, typically either with a noise function or reading off a texture. I was planning on using this when available to make a much tighter fitting mesh around the heightmap models, thereby starting the raycasting rays close to the true surface and sharply reducing the number of steps needed to locate it. Because of the way tessellation works, it can't do the sharp right angles to generate the mesh directly, so I figured that was as good as it gets.

What this guy is proposing is using tessellation to generate points instead of a mesh. Then you feed each point into the geometry shader (which is directly after the tesselaltion shader), and use that to create rectangular columns and fill in the mesh that way. All in linear time, no raycasting required. Aside from a more consistent draw time, this frees up the burden from the fragment shader, which if we are careful we can use for other things and save time. The downside is that screen-space culling speedups won't be effective since most of the time will be in vertex transformations, which need to be transformed before you can know if they are visible (though orders less of them than vanilla minecraft). However, since we are already controlling the grid resolution in the control software, we can still effectively regulate the amount of geometry drawn to the screen.

Overall, it looks like a very useful trick if the GS is powerful enough to beat raycasting, which at this point is anyone's guess. And even without tessellation, we can use this pretty much the same way. The only difference is that we have a larger pool of points with their world position already entered in. Each will then either read from a texture like before or have the column data stored directly in the vertex buffer. In fact, if you do it this way, the heightmap need not even be a 1x1 coorelation. You could specify the any shape of box and knock out a ton of geometry all at once (assuming the texture data is stored elsewhere, thereby making it a drop-in replacement for the polygon drawing stage and simplifying the pipeline.
As long as the pipeline does not become raster-limited due to the overlapping meshes, this may be a lot cheaper on top of using much less memory (though again, depends on the efficiency of the GS).

------------

Most likely, it will be a balancing act like everything else. Unlike raycasting, its complexity is independent of screen space, which makes it less efficient at distant terrain, but far more efficient up close. It would be nice if there was a single best way to do things, but to hit the system requirements of the lower end minecraft PCs, we can't afford to be complacent.

User avatar
Tunnelthunder
Ensign
Ensign
Posts:259
Joined:Wed Dec 05, 2012 10:33 pm
Affiliation:Insomniacs
IGN:Tunnelthunder

Re: Another rendering update

Post by Tunnelthunder » Wed Jan 16, 2013 4:23 pm

Unless we are doing hardcore sniping I think I can handle less efficient distant terrain, and if it means I can be efficient when I am near something it sounds better. Though I am curious, what is the bottom end of GS efficiency?

User avatar
fr0stbyte124
Developer
Posts:727
Joined:Fri Dec 07, 2012 3:39 am
Affiliation:Aye-Aye

Re: Another rendering update

Post by fr0stbyte124 » Wed Jan 16, 2013 4:25 pm

Okay, after getting all excited and looking more at geometry shaders, it's probably best not to expect too much from them. On pretty much every platform they are limited by their emission rate. The more points they output, the worse they perform, and it is always faster to do everything in the vertex shader if you can, where you get more threads doing smaller jobs.

But I haven't given up on it just yet. Becaues of the nature of our geometry, we have advantages which the general cases could never hope to have. For instance, shadow mapping is an orthogonal transformation, meaning the same 3 faces of each block will always face the lightsource with predictable locations. If we know ahead of time where the ligtht source is, we can make a special program which only outputs 3 faces without requiring any logic to do so. We can do the same thing for the most part on a per-chunk basis for regular projections.

Then there is transform feedback, in which you can write the emitted vertices to a new buffer and use them again later. Though it means going back to storing more vertex data to video memory, this could be an interesting method for unpacking geometry on the gpu as needed without transfering a ton of data (though there isn't much geometry to transfer to begin with). Then we'd be looking at about 8 bytes/box, which is much better than the 4 bytes per vertex we had before.

Come to think of it, this could simplify a lot of things, even without the geometry shader. Each box face will be in a specific plane and have a dedicated vertex to define it (so it will know which face it is for free). All textures will have the same depth, so you can calculate or store the proper texture lookup UVs right then and there, and they'll be propagated to the fragment shaders. Then, so long as we don't do any actual texture lookups in the frag shader and get to take advantage of depth culling, the pipeline should still be vertex-limited, meaning we got the texture lookup coords written to the screen buffer for free. There's no reason, then, to have a watertight mesh to begin with. We can even overlap geometry if it will make for a better fit.

Should be interesting.

User avatar
Tunnelthunder
Ensign
Ensign
Posts:259
Joined:Wed Dec 05, 2012 10:33 pm
Affiliation:Insomniacs
IGN:Tunnelthunder

Re: Another rendering update

Post by Tunnelthunder » Wed Jan 16, 2013 4:40 pm

So, ratio better box wise then vertex. Got it. This tells a lot about how things will work and look, but it is hard to comment on. Happy Awakening.

User avatar
fr0stbyte124
Developer
Posts:727
Joined:Fri Dec 07, 2012 3:39 am
Affiliation:Aye-Aye

Re: Another rendering update

Post by fr0stbyte124 » Wed Jan 16, 2013 4:41 pm

Tunnelthunder wrote:Unless we are doing hardcore sniping I think I can handle less efficient distant terrain, and if it means I can be efficient when I am near something it sounds better. Though I am curious, what is the bottom end of GS efficiency?
Less efficient distant terrain isn't just a sniping thing. It will have a continuous effect on the framerate. So it is important that each chunk is being drawn as efficiently as possible, whether that be with rasterization or raycasting.

It's hard to find many details about the GS performance, aside from how much people dislike it. The best I've seen is that a GS thread should be roughly comparable in processing power to a VS thread, and that something like 0-20 scalars outputed is optimal and 22-40 scalars will run at half efficiency. Normally a box would have 8 threads running and calculating all 8 vertices simultaneously. Instead, you have one geometry shader thread doing the same thing. BTW, when I say scalar, that means a single color channel or a single position component. That is not a lot.

User avatar
fr0stbyte124
Developer
Posts:727
Joined:Fri Dec 07, 2012 3:39 am
Affiliation:Aye-Aye

Re: Another rendering update

Post by fr0stbyte124 » Wed Jan 16, 2013 4:46 pm

Tunnelthunder wrote:So, ratio better box wise then vertex. Got it. This tells a lot about how things will work and look, but it is hard to comment on. Happy Awakening.
The thing to take away here is that a lot of this doesn't actually need DX10 or DX11, which is encouraging. And it gives us more flexibility in balancing the vertex shader with the fragment shader.

User avatar
hyperlite
Lieutenant
Lieutenant
Posts:360
Joined:Thu Dec 06, 2012 3:46 pm

Re: Another rendering update

Post by hyperlite » Wed Jan 16, 2013 7:36 pm

You don't need DirectX11 when you have lolcode.

HAI
CAN HAS STDIO?
VISIBLE "HAI WORLD!"
KTHXBYE


http://lolcode.com/
Spoiler:

User avatar
fr0stbyte124
Developer
Posts:727
Joined:Fri Dec 07, 2012 3:39 am
Affiliation:Aye-Aye

Re: Another rendering update

Post by fr0stbyte124 » Thu Jan 17, 2013 10:36 am

Thanks hyperlite. That honestly never occured to me.

ON TOPIC:

This will be an actually quick update because I have things to do today.
I've been pondering generating block meshes from axis aligned boxes some more. Just to recap so we are all on the same page, these boxes can take up any dimensions within a chunk and can clip with one another. By combing a bunch of them you should be able to recreate any block mesh. In some cases this may result in more geometry than just using the exterior vertices, and in others it will result in much less. What it will always require more of is rasterization. Even with early z culling aborting the fragment shader whenever that pixel is occluded by a previously drawn one, the gpu still needs to work out where every pixel of that polygon is going to end up on the screen. Conventionally, having big polygons will make rasterization more efficient because the raster kernel works on big swatches of screen space at a time. If you have DX11 tessellation and polygons each taking up like 2 pixels, that is where you run into bottlenecks. However, the story changes a bit when you are deliberately clipping. It may be a more efficient use of the rasterizer element, but it is also needing a much larger area, most of which is going to be wasted. My theory is that the savings in the vertex shader from transforming fewer vertices and doing less logic on them will make up for loses in the rasterization step and it will all balance out, but I can't test this until I can generate models which use this clipping box format.

The biggest reason for using these clipping boxes instead of regular geometry is that it gives you something you can't rely on otherwise, and that is dedicated vertices. Most data passed to the fragment shader needs to come from the vertex shader, and you can either interpolate it from 3 contributing vertices, or you "flat shade" and dedicate one vertex to passing information into the FS unchanged (normally the first vertex indexed when you draw the triangle). A box has 8 unique vertices, and 6 of them can flat shade all 6 sides. In this case all you need to supply is the face normal and some UV coordinate to help locate the face on the virtual texture map. If you aren't dedicating whole boxes, you have all sorts of different combinations of inner and outer corners making up each quad, and while you could declare one of those corners to be that quad's dedicated vertex, you can't do so deterministically (you would need to solve for the whole chunk), so boxes are the only way to take advantage of flat shading.

Oh and the geometry shader would probably do well here, either generating boxes or even individual quads (can split up the primitives so the correct face information is passed each time, plus smaller throughput). Just because things weren't complicated enough.

I might be over-thinking this. GPUs are designed specifically for filling pixels on the screen. A GeForce 8800 GTX has a fill rate of 36.8 billion pixels/second, or enough to cover 295 1080p screens at 60fps , and z-culling may be faster than that. All this gpu programming is just making me paranoid of wastefulness. I'll need to benchmark this part before too much more, and I'll probably get some of you to help out so we can see what it looks like on different hardware.

Prototype
Developer
Posts:2968
Joined:Fri Dec 07, 2012 1:25 am
Affiliation:NSCD
IGN:Currently:Small_Bear
Location:Yes

Re: Another rendering update

Post by Prototype » Thu Jan 17, 2013 10:46 am

So you are suggesting an alpha of the engine, that's probably a good idea, for I imagine your computer can run anything.

Perhaps not a open alpha at first, but maybe 20 copies to people with different levels of specs (high performance, average and low, and maybe laptops) for if it will run at a decent level on lower end stuff, then it should run fine on high end stuff, after all, the system will only be as strong as its weakest link, and we need to decide what the minimum should be (no point trying to make it run on 20 year old software but its just as pointless making it only run on super duper high end supercomputers)
Spoiler:
Image
Mistake Not... wrote: This isn't rocket science, *!
Image

Spoiler:
Image

User avatar
fr0stbyte124
Developer
Posts:727
Joined:Fri Dec 07, 2012 3:39 am
Affiliation:Aye-Aye

Re: Another rendering update

Post by fr0stbyte124 » Thu Jan 17, 2013 12:26 pm

Warning: stream of consciousness below. Might not make a ton of sense.

Ever see the Westeros server?
Image
A wall like this would be the perfect situation for using a 2-sided heightmap, only it would have to be placed on its side, rather than up-down. So assuming there will be stuff like this, which can be well represented by a heightmap but needs a specific orientation, the whole determining how to represent certain terrain structures might end up getting kind of complicated. The other issue is that with the current virtual texture method, in which each oriented face is in a different texture patch, this is going to be really really wasteful on the side profile, which is a shame because you can more or less determine what the side textures are going to be from the blocks sticking out on the surface. I wonder if there is a better way to represent shallow extrusions like this. Ideally, I would like to get the first tile of extrusion for free if I could. Additionally, there are certain rules to how a heightmap is formed, for instance no concavities on two of the 3 axes. Since it is using the same virtual texture mapping as pure random geometry, it doesn't take advantage of this fact. I wonder how you could...


Just thinking out loud now, let's assume a top down 2-sided heightmap. Top and bottom surfaces can be anything they want to be, even be disconnected if they want. All blocks between the two maps have to be completely solid. Since you know the boundaries when you raycast, your ray can tell if you are touching a surface block or not, so we might be able to use that, but what about the interior blocks? And for that matter, what about the lighting? Baked on lighting will definitely be different for each block, however it is also shared by any surface touching that cube of air. The Eihort visualizer takes advantage of this and uses a 3D texture as a lightmap. However, this is far too wasteful of valuable video memory for us to be using.

Each subsurface contour is going to be a fixed size, and the number of unique light values will always be <= the number of block surfaces so it doesn't make sense to map them together. Additionally, if you have a subsurface contour, you can guarantee at least that slice of the contour is going to be continuous (i.e. nothing blocking the light. We'll pretend non-cube blocks don't exist for now) so the dropoff of light will be linear and continuous, though you won't know where the mins and maxes are. From this it stands to reason that if you had a pre-defined list of light sources and Manhattan distances from them, you could calculate all the light levels for that contour. Specifically you need the maxima as they lie along the contour. Anything further away could be occluded and it would be difficult to tell.

Another way to look at it is that every surface is made up of gradients. Unless occluded or hitting up against a second light source, each tile is a gradient, with one side brighter than the other or darker, unless it is a light source itself. If we can make a sparse representation of these gradients, we can use them over multiple surfaces, but how do we do this? The DXT5 had an interesting solution, in which it took in a 4x4 patch or texels and interpolated 8 points between a high and a low bound. Each of the 16 texels could be any one of those 8 levels, thereby recreating even complex gradients. However, with only 16 light levels, we are not really saving anything by using that method, unless each texel represents multiple tiles, which could lead to inaccuracies.

The other half of this is the block texture itself. It can't take advantages of gradients, but will have the same issues of inefficient access. One solution might be to separate the block type from the heightmap. In this way, you can traverse the heightmap normally with raymarching, but then instead of looking up whether it is the top face or the side and picking a single color, you check a N-level strata guide with some compressed representation of which block is at what depth for each layer. It doesn't take advantage of the fact that the interiors are getting wasted with unseen detail, but it should be quick to access. At the very least, we can ignore the interiors when counting the strata. Each strata will have a blockID and a height and maybe span, which the ray program can iterate through to work out the correct block ID and maybe get textured in on a different pass. I'll look at compact ways of representing strata later.

So that's that. I wonder if we can tie light levels to the strata as well... If there are artificial changes in light, we could make a new strata to compensate. It should be rare enough to be worth the reduced complexity. Regardless, anything we do with strata should be on average less memory expensive than doing the same thing with the random texture lookup (though heightmaps have a 32x32 profile and tile patches are 8x8, but both are negotiable. Maybe strata maps can be 8x8 patches as well). An alternative may be to group like blocks to a single heightmap, rather than giving each column a block ID. If there are a certain number of layers and you need to mark the starting points anyway, it doesn't make sense that you need to mark every column with a blockID, too. On the other hand, you could have some layers with only 1 or 2 blocks for that ID. More things to decide. There's also palettes to consider, since most of the time you will have a limited number of block IDs in a specific heightmap, and you can always split the heightmap to get additional IDs if necessary.

Another thing to consider is that multiple heightmaps can reuse the same texture if they don't overlap. For instance, each tree could have its own bounding box, but map to a single heightmap. Knowing this, we might be able to take advantage of the fact that a bounding box doesn't necessarily need to start outside of a heightmap and instead start at a cross-section. Imagine if you had a solid heightmap of a single block type. Anywhere you place that bounding box will become a solid box, regardless of the intended shape of the original heightmap. If offsets were stored in the bounding box lookup, we could possibly store a bunch of odds and ends in just a few heightmaps and write them in first. That will make the bounding boxes more expensive to store, though, and increase the number of passes over a region. I'm not sure what is best to do.

It all comes down to what is the most effective way of getting terrain information without knowledge of the neighboring terrain. As long as we want to match minecraft's look and feel, having the option for which is a requirement, we have to treat each tile independently from its neighbors in terms of light and texture. We can consolidate geometry, but that doesn't translate efficiently to texture lookup, then it's a moot point. This is where the focus needs to be now.

Come to think of it, separating block lighting from sky lighting is probably a good first step, considering the vast majority of tiles won't have any block lighting. It should really be treated as its own step, complete with bounding boxed lit regions and all. That's probably the easiest thing to do. We could also mark out the areas in which point lights have proper deferred lighting and render them like normal point lights. This won't work as well when there are a ton of lights all next to one another, or give out an even lighting. That's a lot of extra passes. So for the time being, we should probably stick to mapped out regions of contiguous light levels. Maybe pre-defined quad patches with a UV offset for differnt light placements, and a separate masking filter for including multiple light sources (each one is a constant gradient, so lots of space is saved).

User avatar
fr0stbyte124
Developer
Posts:727
Joined:Fri Dec 07, 2012 3:39 am
Affiliation:Aye-Aye

Re: Another rendering update

Post by fr0stbyte124 » Thu Jan 17, 2013 2:53 pm

Well that was a jumbled mess of nonsense. I think the important thing to take away is
1) heightmap rendering can do better than random access texture mapping. The top and bottom will have a guaranteed single surface, but by its nature the sides can have a large number of layers which are mostly empty.

2) The side ID of the top block will be the same as the top value. This can be exploited to avoid the need for mapping the sides when the gradient is no more than 1 block (which happens quite frequently).

3) Probably the most important: Surfaces with sky light of 0 or 15 far outnumber surfaces with a mix. Most surface by far also have a block light of 0. Therefore, it makes sense to keep these light values separate from block textures, which exist on 100% of the visible surfaces, and only expend data on the parts which require detail. Additionally, sky light would be ignored if dynamic lighting were enabled.

4) Light is volumetric in nature, and is always a linear gradient. The fact that light is so smooth means we can interpolate points on a lower resolution lightmap and get the same results, depending on the position of all the light sources. I'm not sure whether we'll be able to take advantage of this, but it is worth thinking about.

Post Reply