Say no to three backup servers

lynx · Jan 27, 2016

GrandMasterDash said:
I'If I had the resources, could I keep my own copy of a full backups?

Good question. I wonder about that myself.

lynx said:
Does that mean that someone has special permissions to upload stored encrypted user data back to the Dash drive, or any MN can do it?

raganius · Jan 27, 2016

GrandMasterDash said:
If I understand correctly, the new Dash Drive will create a decentralized sharded database to help manage user accounts. There is a 5:1 redundancy in the case of failure but to be sure there are to be three large backup servers. It concerns me that someone feels, in the event of disaster, just three servers will be suffice to save the day. In my mind, it demonstrates a lack of faith in the current design and, therefore, a possible weakness in the event of a catastrophic event.

Keep in mind, Evolution is being designed for mass adoption. Any significant loss of data could wipe out the value of dash overnight. We therefore need a system that is incredibly robust. Something we can test and simulate to extremes, and show the system is capable of surviving.

My idea might make no sense, but I'd better bring it here anyway.

OK. The Dashdrive will store user data and Masternode data:

TanteStefana said:
Just wanted to say that Evan confirmed that the Dash Drive is user data and Masternode data, no transactional data

The "user data and Masternode data" to be stored, as I understood it, will be only the strictly necessary for the identification of the user and it's contacts inside the network, the same applying to "Masterode data". So, I guess I can assume the "amount of data per user (or Masternode)" might not be huge.

My idea here is working on the basis that the kind of data per user to be stored would fit in something like a simple json file or similar.... N.B. I know that the total amount of data to be stored by the "global" Dashdrive/network (data per user * number of users) might be huge, but I am dealing here with the data specific to each user.

If my assumption is correct, and the ratio data per user (or Masternode) to be stored will be "small". Maybe the backup the network needs can be made by each user themselves: each respective node carrying the encrypted backup to their own information as distributed on the Dashdrive network.

In this case the specific user (or Masternode) will not have access to reading or writing directly inside its own backup copy it carries. Only the "Dashdrive network" will have the cryptographic permission to identify that user's backup, and: the moment that node (user or Masternode) connects to the network, its info is compared and synced (If eventually there has been any failure on the Dashdrive, the network will use the user's backup to recompose itself).

I hope it makes sense.

TanteStefana · Jan 27, 2016

Evan said something like that last night in slack. Man he must be busy because I hardly see him anywhere. But he did poke in when I mentioned this thread.

I'd like to see all 3 backups. Sharded, full database storage and user storage. Now that I think about it, there is no reason why users can't store their own shards. They might even have a small program or wallet on a cloud service that always keeps their shards up to date. I really like this.

Maybe that's what Evan was saying, but as I understood him last night, I thought he was saying that users could just store their own information period, but that would be too scary for me, LOL. Maybe he meant exactly what you said here

Good one!

lynx · Jan 27, 2016

TanteStefana said:
I'd like to see all 3 backups. Sharded, full database storage and user storage.

Ok, but as far as it was planned, full database storage still seems like a centralized solution. How how would it be done?

TanteStefana · Jan 27, 2016

What Evan proposed earlier, and remember all this is in flux, is that we could pay for something like 3 massive database storage servers in 3 locations around the world (obviously the number could be increased) but these would be expensive and paid for via the budgeting system.

Anyway, I personally like that we've already come up with 3 ways to store the information. It would be extremely varied, which makes it extremely secure IMO

Banks, with all their money, couldn't do better. And I doubt they even come close.

GermanRed+ · Jan 27, 2016

TanteStefana said:
What Evan proposed earlier, and remember all this is in flux, is that we could pay for something like 3 massive database storage servers in 3 locations around the world (obviously the number could be increased) but these would be expensive and paid for via the budgeting system.

Anyway, I personally like that we've already come up with 3 ways to store the information. It would be extremely varied, which makes it extremely secure IMO Banks, with all their money, couldn't do better. And I doubt they even come close.

TanteStefana, have the core developers looked into the Kinetic Open Storage solution? It would be nice to know that the developers had actually considered this approach even if they thought it was not the correct one. Would like to hear their comments on using Kinetic Open Storage and IPFS.

TanteStefana · Jan 27, 2016

GermanRed+ said:
TanteStefana, have the core developers looked into the Kinetic Open Storage solution? It would be nice to know that the developers had actually considered this approach even if they thought it was not the correct one. Would like to hear their comments on using Kinetic Open Storage and IPFS.

Can you explain how that works? I don't get it. It sounds like these hard drives talk directly to the internet, but I can't see why anyone would want to open up portions of their hard drive to the internet? Do they get paid for this? Or what? Thanks, It sounds interesting, but I'm not getting the concept or usage??

GermanRed+ · Jan 27, 2016

TanteStefana said:
Can you explain how that works? I don't get it. It sounds like these hard drives talk directly to the internet, but I can't see why anyone would want to open up portions of their hard drive to the internet? Do they get paid for this? Or what? Thanks, It sounds interesting, but I'm not getting the concept or usage??

The hard drives are connected directly through TCP/IP rather than through a SATA or SAS. Each kinetic HDD has an ARM chip with 64 MB cache and 512 MB of RAM. The ARM controls how the data is written. Instead of writing to the media with tracks and sector through the SATA/SAS port by the OS, you have object storage API. Any device connected to the network can get/put/delete data objects to these Kinetic drives. How the data is written and retrieved is done by the ARM chip and its embedded OS. The network can be intranet or internet.

What that promises is multiple hosts can directly access these drives at the same time without going through a fileserver. You can program how many copies of data stored in different Kinetic drives. For example, the developer can program DASH to write to its intranet Kinetic drives only if there isn't a copy of the data. An MN operator can run an intranet with Kinetic drives behind all his MNs. These MNs connect to two networks: the internet and the intranet with many Kinetic drives.

What does that mean to us? We can save disk space and money. For example, an operator has 64 MNs and he does not want to spend money on VPS because both the bandwidth and storage are expensive when we provide so many DAPI services. These 64 MNs programmed with Kinetic API can simultaneously access these Kinetic HDDs. With enough RAM/CPUs and a 1Gbps fiber connection at home, one can probably serve 64 MNs or even more with a single host. However, it really makes no sense to keep 64 copies of data because if that single 1Gbps internet connection fails, all MNs data are not accessible from the outside world. So, let's say you have 256 4TB HDDs. Without Kinetic drives, you can either connect 4 HDDs to each MN thus 16 TB disk space per MN (i.e. unshared scenario). With Kinetic drive, you can program these MNs write four copies for each data object on these 256 4TB HDDs (shared with Kinetic HDDs). Then, you have a total of 4 copies of 256 TB of storage space. Alternatively, one may save/share disk space by setting up fileserver that has thin provisioning or deduplication features but that takes a lot of maintenance and equipment cost. However, the advantage of Kinetic drive approach is that if the drive fails, you just replace it with a new one and the drive will be filled with data from other copies. None of these 64 MNs will be interrupted by the HDD failure. In the fileserver approach, you can affect all these MNs connected to the fileserver. You can also expand your capacity simply by adding more and more Kinetic drives. It will make life much easier. The total cost of running this will be much lower than putting these on VPS if the DASH network gets really busy in the future. One can pay for a 1Gbps fiber at home and buy a 16-core Xeon-D machine with 128GB RAM to server 128 MNs from home. Behind this machine, he can put tens to hundreds of Kinetic HDDs as needed. That would be almost maintenance free. The 1 Gbps fiber at home may cost ~$25/mo. The electricity for running that Xeon-D machine may be another $25/mo or less. Then, the electricity for the HDDs and the network switch depends on how many HDDs and switch you have.

EDIT: The Kinetic HDDs can also connect to the internet directly but that may not be a good thing to do due to security issue. So, having them behind an intranet is probably the way to go. Please watch the video from this thread Evolution - Dashdrive Discussion. The folks from Seagate did a much better job explaining this.

EDIT2: Let's say we have ten operators with this kind of setup. Then, we have forty complete copies of the DASH data. Then, making it cheap for the operators to have a complete copy of the data is way better than having three official copies of data.

TanteStefana · Jan 28, 2016

So you're saying this would be a better setup than a few VPS full database storage backups, right?

Very interesting. The only thing I can think of is that this would be different from running a Masternode, and it may also be harder to get this set up in widely different geographical locations. I mean, you could probably put a database on a VPS in Africa somewhere, but to get someone to run something like this reliably would be harder there. At least, it would be harder to control for the Foundation or core team, who would likely be tasked with running these. I don't think we should task MN owners to run something like this from home because frankly, it just makes it harder to get quality masternode performance from home. Also, a 1gb fiber link is simply not available everywhere, and we don't want to restrict geographical locations for MN network (it'll end up only in the richest countries, who would most likely react to things in a similar fashion, IE want to shut down VPS with crypto on them or some such)

Still this is very intriguing

eduffield · Jan 28, 2016

GrandMasterDash said:
If I understand correctly, the new Dash Drive will create a decentralized sharded database to help manage user accounts. There is a 5:1 redundancy in the case of failure but to be sure there are to be three large backup servers. It concerns me that someone feels, in the event of disaster, just three servers will be suffice to save the day. In my mind, it demonstrates a lack of faith in the current design and, therefore, a possible weakness in the event of a catastrophic event.

Keep in mind, Evolution is being designed for mass adoption. Any significant loss of data could wipe out the value of dash overnight. We therefore need a system that is incredibly robust. Something we can test and simulate to extremes, and show the system is capable of surviving.

So there's no DashDrive paper yet, so I definitely understand the concerns that are being raised. Think of DashDrive as having multiple levels of redundancy, for different levels of data according to importance. DashDrive is going to utilize a blockchain itself and each block is going to consist of files being added, appended or deleted. Each block object will also have a Merkle root and be setup identically to the blockchain, based on the file hashes, we'll shard off the pieces of data to be stored safely across the network.

Level 1: Full Redundancy - Profile data, masternode data and masternode quorum data is the only mission critical data that is being stored on DashDrive. This data would be duplicated on every masternode on the network.
Level 2: Multiple Redundancy - Meta data, transactional descriptions, historical data, etc. This is non-mission critical data, so if the data is lost, the system will be able to function correctly without it.

There will also be a type of node called an archive node, you'll have to be a masternode, but in that case you'll backup all data on Level 2 storage. This means, if we lose every piece of data on a specific shard, it will still exist on the archive nodes. These can be ran by a few volunteers and the Dash Foundation eventually.

eduffield · Jan 28, 2016

Comodore said:
I will not be so frightened. We will soon have IPFS, Storj. and more choice. But the choice of redundancy is important question.

We could also use these other services as a method of storing the non-critical data. I wouldn't be opposed to that.

eduffield · Jan 28, 2016

GermanRed+ said:
TanteStefana, have the core developers looked into the Kinetic Open Storage solution? It would be nice to know that the developers had actually considered this approach even if they thought it was not the correct one. Would like to hear their comments on using Kinetic Open Storage and IPFS.

I'll look into this

GermanRed+ · Jan 28, 2016

TanteStefana said:
So you're saying this would be a better setup than a few VPS full database storage backups, right?

Very interesting. The only thing I can think of is that this would be different from running a Masternode, and it may also be harder to get this set up in widely different geographical locations. I mean, you could probably put a database on a VPS in Africa somewhere, but to get someone to run something like this reliably would be harder there. At least, it would be harder to control for the Foundation or core team, who would likely be tasked with running these. I don't think we should task MN owners to run something like this from home because frankly, it just makes it harder to get quality masternode performance from home. Also, a 1gb fiber link is simply not available everywhere, and we don't want to restrict geographical locations for MN network (it'll end up only in the richest countries, who would most likely react to things in a similar fashion, IE want to shut down VPS with crypto on them or some such)

Still this is very intriguing

We do not need to enforce everyone to have a full copy of the entire DASH drive. But, we can reward MN operators with such capacity so that we do not rely solely on the three official full copies. Perhaps, we can make MNs work like volume manager in traditional sense which decide how many copies it will create and the minimum number of copies online. For instance, default is five copies total with three copies online. However, the users who are really paranoid about his data availability can subscribe to have more copies on the network. Thus, if one put more real resources into his/her MNs, he/she gets more reward.

EDIT: To address your concerns of running this at datacenters, I think the VPS providers will probably migrate slowly to Kinetic Open Storage as well based on this interesting Swift presentation:

TanteStefana · Jan 28, 2016

Yah, and really, it could become a requirement to run a MN when prices are high enough to pay for it.

GermanRed+ · Jan 28, 2016

eduffield said:
I'll look into this

Thanks, Evan. Whatever storage technology the DASH team choose to implement the DASHDrive, please make sure that this storage technology is active-active highly available and horizontally scalable.

nodeComplex · Jan 28, 2016

To me this feature would be more clear if it was called something like DashDirectory (instead of DashDrive). Because it sounds
like we're really talking about a blockchain storage behind Dash directory services where each masternode acting as a domain controller (like in Active Directory) not general purpose drive storage (as in storing Petabytes of videofiles or powerpoint presentations) where each masternode would be acting like a file server or storage network.

I'm not a blockchain expert. I get the impression the blockchain is best thought of as a single table with a structure optimized for its purpose. So

There is the current financial blockchain, and
there would be a 2nd directory services blockchain.
But if you wanted to store files of any type you would need a 3rd blockchain that resembled a proper file system (like FAT, NTFS).

NTFS is not AD and vice versa. It's obviously way quicker to replicate a lightweight directory (like LDAP) across many nodes than it is the files in a large file system. There may even be AD installations with 3400 nodes already that have good answers to how to optimize the situation for dash. The general purpose storage pursued by STORJ is interesting, they claim high performance, but I am highly skeptical. I need to try it out.

Say no to three backup servers

lynx

Active member

raganius

cryptoPag.com

TanteStefana

Well-known member

lynx

Active member

TanteStefana

Well-known member

GermanRed+

Active member

TanteStefana

Well-known member

GermanRed+

Active member

TanteStefana

Well-known member

eduffield

Core Developer

eduffield

Core Developer

eduffield

Core Developer

GermanRed+

Active member

TanteStefana

Well-known member

GermanRed+

Active member

nodeComplex

New member