SQL Database Stuff
- fr0stbyte124
- Developer
- Posts:727
- Joined:Fri Dec 07, 2012 3:39 am
- Affiliation:Aye-Aye
I've been working a lot with SQL databases lately, and I am beginning to seriously respect the power of the things.  It does queries and stuff, which is great, but more than that it takes care of disk management, including rearranging data for coherent access on the hard drive.  It handles multi-threading I/O in a thread-safe manner, and can handle a massive number of simultaneous threads.  It can do version control and roll-backs if there are any errors, and you can make automated backups.  You can farm the data off onto multiple servers if one is not powerful enough.  And aside from the setup, it manages all of this on its own.
I am beginning to think that Ectrimble was absolutely right to try and move user data into a database instance. I thought it was overkill, but now I am thinking it is just what we need to increase the scalability of the system. And not just for user data, either.
I've looked around before to see what other people think, and the general consensus is that for user data, sure, but chunk data is better in flat files. It is certainly the case for a single player or a basic server because you are never accessing that many files at any given time, and the data compression is much more important for terrain bandwidth than the I/O tweaks an SQL server would give you. But that doesn't apply so much to us.
For one, we are not dealing with data from a single world. The master servers may be moving data around for hundreds of worlds, each of which need to be archived and synchronized. Chunk caching will become even more important than it is now, and it will likely be necessary to cache parts of the terrain to a player's local storage to avoid swamping the cloud with redundant chunk access.
A second reason is the Copernicus terrain system is built with the concept of layers, in which the world is separated into what is needed right away and what can wait. You might separate exposed blocks and buried blocks, or light values touching light vs. light values for empty space or underground. You might want to store heightmaps at different levels of detail for distant rendering, which in turn may act as hints for a compression scheme on the other terrain info. Using flat files, you would have to store all of that together, and load the whole thing into memory to pull out the parts you want, or you would need to create additional files for each layer, which would further fragment the data on the disk. In a database, you can arrange it according to how you intend to access it, and pull out just what you need. Backups can be made smaller by only storing the data which can't be re-generated, so no un-modified chunks or LOD perspectives.
In short, replacing the filesystem with an SQL database for Futurecraft is beginning to make a crazy amount of sense.
Dedicated planet servers can use hardcore MySQL. Player-run servers and clients can use SQLLite, which is embeddable in-process, and both the client and server instance can access the database directly, which might be handy.
			I am beginning to think that Ectrimble was absolutely right to try and move user data into a database instance. I thought it was overkill, but now I am thinking it is just what we need to increase the scalability of the system. And not just for user data, either.
I've looked around before to see what other people think, and the general consensus is that for user data, sure, but chunk data is better in flat files. It is certainly the case for a single player or a basic server because you are never accessing that many files at any given time, and the data compression is much more important for terrain bandwidth than the I/O tweaks an SQL server would give you. But that doesn't apply so much to us.
For one, we are not dealing with data from a single world. The master servers may be moving data around for hundreds of worlds, each of which need to be archived and synchronized. Chunk caching will become even more important than it is now, and it will likely be necessary to cache parts of the terrain to a player's local storage to avoid swamping the cloud with redundant chunk access.
A second reason is the Copernicus terrain system is built with the concept of layers, in which the world is separated into what is needed right away and what can wait. You might separate exposed blocks and buried blocks, or light values touching light vs. light values for empty space or underground. You might want to store heightmaps at different levels of detail for distant rendering, which in turn may act as hints for a compression scheme on the other terrain info. Using flat files, you would have to store all of that together, and load the whole thing into memory to pull out the parts you want, or you would need to create additional files for each layer, which would further fragment the data on the disk. In a database, you can arrange it according to how you intend to access it, and pull out just what you need. Backups can be made smaller by only storing the data which can't be re-generated, so no un-modified chunks or LOD perspectives.
In short, replacing the filesystem with an SQL database for Futurecraft is beginning to make a crazy amount of sense.
Dedicated planet servers can use hardcore MySQL. Player-run servers and clients can use SQLLite, which is embeddable in-process, and both the client and server instance can access the database directly, which might be handy.
- 
				Chairman_Tiel
- Rear Admiral 
- Posts:1890
- Joined:Sat Dec 01, 2012 9:39 am
- Affiliation:GLORIOUS REPUBLIC
Re: SQL Database Stuff
Throughout your post you don't really present the cons of storing data in SQL. Is inconvenience the only factor, seeing as many of these minecraft-only hosting setups don't support databases without additional charges?
Given my limited experience with them I'd say it's a great idea, but again, is there really any reason not to?
				Given my limited experience with them I'd say it's a great idea, but again, is there really any reason not to?
Last edited by Chairman_Tiel on Thu May 23, 2013 7:18 am, edited 1 time in total.				
 [/spoiler]
[/spoiler]- fr0stbyte124
- Developer
- Posts:727
- Joined:Fri Dec 07, 2012 3:39 am
- Affiliation:Aye-Aye
Re: SQL Database Stuff
The cons are pretty much the reasons you would expect for programs not using a database.  More CPU draw, more work to set up and copy around, potentially less space efficient due to differences in compression but I would contest that, and is less efficient at accessing really big data objects than direct file access.  You also have this big database competing with Minecraft software for system resources, at least on dedicated servers assuming they are on the same machine (they wouldn't have to be).  It's by no means a silver bullet, but if we lean on it more in the code, I think it can do a better job at storage than the filesystem does.
Accessing the data from a remote machine could potentially be huge. If we wanted, for instance, we could have the client consuming active chunks from a dedicated planet server, but it would download the older untouched chunks from an Amazon cloud service. Considering how much of a server's bandwidth is consumed by chunk downloads compared to everything else combined, splitting the data like this could greatly improve the quality of gameplay on lower end servers.
			Accessing the data from a remote machine could potentially be huge. If we wanted, for instance, we could have the client consuming active chunks from a dedicated planet server, but it would download the older untouched chunks from an Amazon cloud service. Considering how much of a server's bandwidth is consumed by chunk downloads compared to everything else combined, splitting the data like this could greatly improve the quality of gameplay on lower end servers.
- fr0stbyte124
- Developer
- Posts:727
- Joined:Fri Dec 07, 2012 3:39 am
- Affiliation:Aye-Aye
Re: SQL Database Stuff
I don't really get what this upvote thing is.  We aren't voting on anything.
Anyway, I found an interesting operation in DynamoDB called BatchGetItem(). Normally, you can query a set of data either by a single key or by a range of values, which would be sufficient for pulling out chunks from the database. But what if you don't want all of them all at once? What if, for instance, you have some of them cached, or are doing occlusion culling and only need a subset of all the chunks. Normally, you would be out of luck, and would have to take all of them or do it one at a time. But BatchGetItem() gives you a third option. If you are doing your filtering in-process, you can use this command to request a list of rows from the table. A single operation retrieves up to 1MB of data or up to 100 items. If the process has more things to retrieve, it will also send back a list of unprocessed keys.
In this scenario, say we are doing the basic occlusion culling with the demo I did last year. Beforehand, we could download the 16-bit occlusion key for each cubic chunk, and the raw size of that row. Then we would do an occlusion test using those keys, filter out the chunks we already have, and begin constructing a request for the chunks we still need. Since we know the size of each row which will be returned, we can work out ahead of time how many batch requests to make and how large each one will be, and start queuing those up. By prioritizing only the chunks that you will see, the chunk downloading will appear much faster and more responsive than normal, on top of using Amazon's bandwidth.
I'm starting to think this will be a good fit.
			Anyway, I found an interesting operation in DynamoDB called BatchGetItem(). Normally, you can query a set of data either by a single key or by a range of values, which would be sufficient for pulling out chunks from the database. But what if you don't want all of them all at once? What if, for instance, you have some of them cached, or are doing occlusion culling and only need a subset of all the chunks. Normally, you would be out of luck, and would have to take all of them or do it one at a time. But BatchGetItem() gives you a third option. If you are doing your filtering in-process, you can use this command to request a list of rows from the table. A single operation retrieves up to 1MB of data or up to 100 items. If the process has more things to retrieve, it will also send back a list of unprocessed keys.
In this scenario, say we are doing the basic occlusion culling with the demo I did last year. Beforehand, we could download the 16-bit occlusion key for each cubic chunk, and the raw size of that row. Then we would do an occlusion test using those keys, filter out the chunks we already have, and begin constructing a request for the chunks we still need. Since we know the size of each row which will be returned, we can work out ahead of time how many batch requests to make and how large each one will be, and start queuing those up. By prioritizing only the chunks that you will see, the chunk downloading will appear much faster and more responsive than normal, on top of using Amazon's bandwidth.
I'm starting to think this will be a good fit.
- Dux_Tell31
- Midshipman 
- Posts:100
- Joined:Thu Dec 06, 2012 12:22 pm
- Affiliation:Tellrim
- IGN:tell31
Re: SQL Database Stuff
Thats Great! However, what did you mean when you said "Amazon's bandwidth"? Is this one of Amazons cloud products? Can you please elaborate on this further.
"This is Minecraft, sir. We don't make physics, we ruin them."  -Fr0stbyte124
"We are made of the elements out in space, in essense we are the universe discovering itself" -Neil Degrasse Tyson
			"We are made of the elements out in space, in essense we are the universe discovering itself" -Neil Degrasse Tyson
- fr0stbyte124
- Developer
- Posts:727
- Joined:Fri Dec 07, 2012 3:39 am
- Affiliation:Aye-Aye
Re: SQL Database Stuff
Yes.  DynamoDB is an Amazon NoSQL cloud database service.
Bandwidth charge is:
Write Throughput: $0.0065 per hour for every 10 units of Write Capacity
Read Throughput: $0.0065 per hour for every 50 units of Read Capacity
You can think of a write unit as 1 kB/s of bandwidth, and a read unit as 4kB/s.
Storage space is $0.25/GB per month with the first 100MB free.
Data transfer is $0.120/GB per month with the first GB free.
If you are wondering, yes, these are ridiculously good prices for data hosting. We can host one or two dedicated servers in someone's basement in Kansas City for not too much upkeep, but that only goes so far. Dedicated rack-space anywhere is out of the question unless we try to go commercial with FC, and even then the upkeep will be enormous. On the other hand, a cloud database like this one can be infinitely scaled to any budget, even down to a promotional free tier for a while if the requirements are low enough. Additionally, it offsets the largest piece of the bandwidth consumption, allowing dedicated servers to steam live data more efficiently on less expensive hosting.
I've never heard of anyone else using Amazon to host MMO data, but Minecraft is a sort of special case, using lots of bandwidth on infrequently updated data. Other games would already have everything but live data on local storage, so it's more of an apples/oranges situation. In any case, this is more of a long term plan, especially until we can nab another database administrator. The engine should be able to work with or without data assistance.
			Bandwidth charge is:
Write Throughput: $0.0065 per hour for every 10 units of Write Capacity
Read Throughput: $0.0065 per hour for every 50 units of Read Capacity
You can think of a write unit as 1 kB/s of bandwidth, and a read unit as 4kB/s.
Storage space is $0.25/GB per month with the first 100MB free.
Data transfer is $0.120/GB per month with the first GB free.
If you are wondering, yes, these are ridiculously good prices for data hosting. We can host one or two dedicated servers in someone's basement in Kansas City for not too much upkeep, but that only goes so far. Dedicated rack-space anywhere is out of the question unless we try to go commercial with FC, and even then the upkeep will be enormous. On the other hand, a cloud database like this one can be infinitely scaled to any budget, even down to a promotional free tier for a while if the requirements are low enough. Additionally, it offsets the largest piece of the bandwidth consumption, allowing dedicated servers to steam live data more efficiently on less expensive hosting.
I've never heard of anyone else using Amazon to host MMO data, but Minecraft is a sort of special case, using lots of bandwidth on infrequently updated data. Other games would already have everything but live data on local storage, so it's more of an apples/oranges situation. In any case, this is more of a long term plan, especially until we can nab another database administrator. The engine should be able to work with or without data assistance.
- 
				Dr. Mackeroth
- Designer 
- Posts:397
- Joined:Fri Dec 07, 2012 11:59 pm
- Affiliation:Alteran
- Location:In the Holy Citadel of Altera
Re: SQL Database Stuff
I feel completely useless in this situation. This choice is entirely up to you, Fr0stbyte, none of us have the experience to tell you what to do, so don't listen to anyone.
This is a signature.
			- fr0stbyte124
- Developer
- Posts:727
- Joined:Fri Dec 07, 2012 3:39 am
- Affiliation:Aye-Aye
Re: SQL Database Stuff
Is that an order?Dr. Mackeroth wrote:I feel completely useless in this situation. This choice is entirely up to you, Fr0stbyte, none of us have the experience to tell you what to do, so don't listen to anyone.
Re: SQL Database Stuff
Yea, you gonna work on Futurecraft for the rest of your life ! (at least that is how it looks like  )
 )
 )
 )They're watching ... 
"I am forbidden tag" -CvN
			
"I am forbidden tag" -CvN
- fr0stbyte124
- Developer
- Posts:727
- Joined:Fri Dec 07, 2012 3:39 am
- Affiliation:Aye-Aye
Re: SQL Database Stuff
Man, remember when this was just going to be a Zeppelin replacement?  I've made a terrible mistake.
			- Dux_Tell31
- Midshipman 
- Posts:100
- Joined:Thu Dec 06, 2012 12:22 pm
- Affiliation:Tellrim
- IGN:tell31
Re: SQL Database Stuff
On the other hand, you've inspired several people to learn the wizardry that is programming; Including myself.fr0stbyte124 wrote:Man, remember when this was just going to be a Zeppelin replacement? I've made a terrible mistake.
"This is Minecraft, sir. We don't make physics, we ruin them."  -Fr0stbyte124
"We are made of the elements out in space, in essense we are the universe discovering itself" -Neil Degrasse Tyson
			"We are made of the elements out in space, in essense we are the universe discovering itself" -Neil Degrasse Tyson
- 
				ACH0225
- Vice Admiral 
- Posts:2312
- Joined:Sun Dec 09, 2012 10:21 pm
- Affiliation:Strigiforme
- IGN:ACH0225
- Location:Cuuyth
Re: SQL Database Stuff
Not worth it at all.

mfw brony imagesfr0stbyte124 wrote:5 months from now, I will publish a paper on an efficient method for rendering millions of owls to a screen.
Spoiler: 







