Distributed platform storage (avoids High Performance nodes)

krilen

Member
There is an ongoing discussion about the introduction of a High Performance masternode (HPMN) type with strengthened hardware resources with the goal of reducing data platform fees. Three options are proposed by Dash Core Group in the Introductory presentation on High Performance Masternodes.
Here, I propose another option that seamlessly distributes platform storage to a subset of nodes, enabling low fees without requiring changes to the masternode network. Also, the level of decentralization can (but does not have to) be selected per data contract.

I am looking for feedback from a platform developer or a technically savvy person on the feasibility of the proposed implementation. I have not studied the Platform code yet, so there may be technical difficulties that I am not aware of. I propose the solution based on high-level assumptions. If any of the explicit or implicit assumptions do not hold, the proposed solution may need to be refined or discarded.

Proposed solution

The key idea is not to store platform data on all nodes or use HPMN, but instead to distribute it randomly and deterministically to a subset of nodes from a deterministic masternode list (*).

(*) Some refer to this type of solution as "sharding", but I will avoid that term. The term is somewhat misleading and overly simplistic because in this case only the storage is sharded, the shards are distributed (the same masternode typically participates in multiple shards), the shards are not defined at the platform/network level, etc.

The subset of nodes that store platform data varies for each data contract. Each node in the subset contains all data of the respective data contract. The size of the subset, i.e. the number of redundant nodes, is defined by the owner of the contract.

How many redundant nodes?

The data contract (DC) owner chooses the number of redundant nodes DC_N when creating the data contract based on the desired level of decentralization. It knows best what the decentralization needs of the application are. If the need is low, it can reduce the fees for users and itself by choosing a low DC_N (i.e. DC_N=10). If it is a critical application that needs to be uncensorable even to governments, it can choose a high DC_N (e.g. DC_N=1000 or even DC_N=MaxInt <=> infinity <=> all nodes in the list)

Which nodes? How are the nodes selected for the subset of masternodes to contain the application data, and how are they tracked?

The nodes used for data storage are randomly and deterministically derived from the hash of the data contract. The number of redundant nodes (n) can be obtained from the metadata of the data contract.
In Dash the list of masternodes is deterministic and their IP addresses are public. An example (in Python for simplicity) of the function that random deterministically builds a subset of masternodes for a data contract is:
Inputs: data contract id (DC_ID), number of redundant nodes (n)

Python:
random.seed(sha256(DC_ID)) # seed te PRG with the hash of DC_ID
subset=random.sample(deterministic_masternode_list, n)

There is no need to index or track which nodes store data for which data contract.
No changes are made to the API with the proposed solution. All requests (queries) can be made to any enabled master node as before (even if it is not included in the subset for that application, i.e. does not have that application data). The additional process of finding a node that actually has the data is performed by the node that receives the query (you can think of this solution as the storage virtualization layer introduced in the middle to make access to stored data seamless and transparent). For the user, the result is as if the data were available on the node itself.
The list of nodes for a data contract can be cached to speed up node selection.

(Masternode churn) What if a node storing DC data becomes unavailable, i.e., is shut down, leaves the masternode network because the owner sold the collateral, etc.?

Do nothing. The churn rate for masternodes is low. When a node leaves, another one takes its place, but it does not initially contain any data, even if designated to do so. The key takeaway is that the new node does not need to synchronize platform data right away. The new node starts empty. When a request for data comes (because the node can be selected), the node cannot provide the data, so another node must be selected from the list.
Similar to Copy-on-Write technique and caching, the node that receives a request and cannot respond will select another node from the list of (redundant) nodes for that data contract, get the data, update its local storage, and respond to the request. Next time, the node will be ready. All of this is transparent to the user making the request.
This affects performance slightly, but is a rare occurrence given the low churn rate of masternodes and typically high data redundancy.

Recap

Two main concepts determine the implementation:
- (1) distribution of application data to a randomly selected subset of nodes
- (2) shifting the responsibility for selecting the targeted number of redundant nodes (degree of decentralization) to the data contract owner
Advantages of the proposed solution:
- less need to increase the storage requirements for masternodes (the load is distributed)
- low fees (again, thanks to the distribution of storage costs)
- no change of the masternode network
- maximum flexibility in decentralization (L1 decentralization is preserved, while L2 applications can set the desired level of decentralization as they wish)
- seamless for users (user interaction with the platform is unchanged, developer interfaces are unchanged, performance loss is likely to be small in practice)
 
Thank you for this extremely intelligently written proposition.

So here's the TLDR but I'll go into detail right after:

As you wrote there are some aspects of the system that you do not know which makes this system not feasible based on how you outlined it above. However many of the ideas presented are in line with how we were planning on doing sharding. However solving some of the problems of sharding are a monumental task. If a community member proposes we do sharding and it wins the vote, DCG will respect it and get to work on sharding, but this will mean a delay of at least 4 months, most likely 6 months to a year. My view on the matter is that we should get everything working in our system first then when usage is too high go to a sharded solution which represents horizontal scaling.

Long version, I'm going to be thorough:

Platform works by using threshold signatures in a fixed sized quorum. In a quorum each member votes on blocks by producing signatures and then propagating them, these signatures are threshold recovered to produce 1 signature proving that 2/3rds + 1 of the quorum have agreed on something to be true. This signature signs the block id which is a hash of the block header and state information. Inside the state information is the root app hash, and this root app hash represents the root of the merkleized groveDB structure. This means we will be able to prove anything in the state with a grovedb proof going to the root of the "tree" (grovedb is actually a multidimentional directed acyclic graph) and the signature signing the root.

Right now we pseudorandomly choose 100 members out of the DML (deterministic masternode list) to form a quorum. Quorums currently last for 1 day, and we have 24 of them. Pretty much 1 is created every hour. Because of quorums being completely destroyed after a day it's very hard to do sharding. Let's imagine you put a subset of data on a quorum, but then this quorum is replaced with an entirely different quorum, all the data would need to be migrated to the new quorum every day. So this is obviously not the right solution. Instead what is needed is to have the quorum only kick out nodes that weren't doing their job and/or had decided to leave. And in here lies the difficulty. We would need a robust PoSe solution to know which MNs aren't properly participating, or not participating at all. And then we would need to have a way to prove this information securely to our core software which currently has the responsibility of creating quorums. And then we would need to alter the selection mechanism of quorums.

There are also other difficulties about sharding that need to be considered. Each subset needs to have the validator list of every other subset, so they can pretty much talk to each other through IBC. IBC needs to work (currently might not, was planned for post release).

One thing mentioned above that is very great with sharding is that you might be indeed able to duplicate data. But this isn't a given. Mostly because two shards of the network might process state transitions in a different order leaving data in slightly different states. To do duplicate data contract data between shards you would need a mechanism for copying data between them directly. So this is also more work but a great feature!
 
Thank you for taking the time to respond. You make a heroic effort to respond to all the posts that are thrown at you. My hat is off to you.
I also apologize for being so late in replying, but I have had to meet my work commitments which have taken up all my time lately.

After reading your reply, some details have become clearer to me. Thank you very much. I also took a quick look at the Dash Platform documentation.
However, my proposal still seems very workable and easy to implement.

I propose to implement platform data distribution (sharding, if you will) in such a way that quorums, state transitions and grovedb structures are not affected. The APIs for users and developers remain unchanged. DAPI remains unchanged. Documentation remains unchanged. State transitions remain unchanged. Proofs remain unchanged and all structures remain unchanged. Blocks are identical, as are signatures and root app hashes.

The platform data distribution I propose has nothing to do with Dash Platform Protocol or TenderDash. It takes place at a lower level (think of it as a virtualization layer on top of MN 's file system write and read system calls). This happens after the rules have been successfully validated by the Dash Platform Protocol. When grovedb wants to write the 'document' type payload to disk, it first checks if MN is in the subset of the associated data contract and writes to disk only if it is, otherwise it is ignored.

When a get(document) or query arrives, the same is done in reverse order. If MN is in the subset, the data is read from disk as before. If not, the data is not available and normal execution is interrupted. A request for missing data is made to another random MN from the subset of this data contract (or optionally from a cache table to increase performance) and then answered.

Can you please think about this again? Is there anything else I am missing?

As for a robust Proof-of-Service I have some half-baked ideas for a simple implementation, but that's another topic. I'll be happy to elaborate when we get to that point.
 
Sorry, I just noticed your reply, I am about to go to sleep, but will respond when I wake up.
 
Let me try to give some additional explanation of the proposed solution to make it more understandable and add some additional thoughts on the subject.

The goal is to split the platform data between the masternodes, so that each masternode does not need to store the whole platform state, but only a part of it. We have many options for how to split the data. Here I propose to use the Data contract ID as the value by which we can decide which data goes to which masternode. I assume that there will be many data contracts (DC) and that most of the data will go through them. With this solution (at least in its current form), the interactions that store/modify data that do not have a Data contract ID will not be sharded, i.e. they will be on all MNs.
The core idea is that MN does not need to store information about data contracts that are not assigned to it.

The sharding concept - illustrated

Here is a graphical illustration of current situation (all masternodes store the entire Data platform state):

all.png


Here is the same situation with my proposal (gray squares are data contracts for which the payload is not stored):

sharded.png

The number of nodes that store a data contract's data is chosen by data contract owners. This is just an example.

Additional benefits of the solution

  • Hosting providers and whales can not 'cut corners'
When multiple masternodes are hosted by the same entity, it is possible to cut corners, i.e. reduce storage costs, by sharing platform data across multiple masternodes. This reduces data redundancy and thus network robustness, security and performance.
In the proposed solution, platform data is sharded so that MNs do not have the same data (the platform database file is different for each node). Platform data sharing between MNs by hosting providers is no longer an issue.
  • Easy to implement
Minimal or no changes to the existing platform architecture and data structures are required. The most important function to implement is to check if a MN is in the subset for a DC (see the two lines of code in the first line for a first hint) and to forward requests to other MNs if necessary.
  • Can be combined with other solutions
The proposed solution is independent and fully compatible with other proposed solutions, including time locking collaterals, introducing high performance nodes, etc.

Caching (optional, can be added later)
Suppose that one day we have thousands of data contracts on the platform, but only a few of them are heavily used on a daily basis. With the proposed solution, a simple form of caching can be implemented to increase network performance quite easily by creating the database for a subset of the frequently used DCs. This cache is usually the same for all MNs in DML, but not necessarily. When a request arrives, MN first looks in the cache, then, if not found there, in its subset of DCs, and then continues as usual.

Solution Problems
  • Different storage requirements for MNs (unfairness)
The subset of DCs allocated to a MN is randomly determined. If the number of data contracts registered on the platform is small and/or the distribution of storage requirements between data contracts is highly skewed, i.e. only a few DCs store large documents (require much more storage than the average), then I might be the unlucky one MN, storing much more data than the average MN.
I expect that with mass adoption there will be many different data contracts registered on the Dash platform so this will be less of an issue.
  • Provable inter - data contract queries. I do not know if this is even required, currently being considered or implemented, but it could cause some difficulties with this solution.
 
There is only one severe drawback in your solution. Bandwidth speed.

It is a lot faster (in terms of bandwidth speed) when you download a piece of data from a single node, rather than downloading the same piece of data from multiple nodes (especially in case these multiple nodes have not identical bandwidth capacity).

Even in case these multiple nodes have identical bandwidth, there is still a CPU delay when trying to concatenate the data, but this delay is not an issue if you have a decent CPU and the data piece you target is not dispersed among too many nodes.
 
Last edited:
There is only one severe drawback in your solution. Bandwidth speed.

It is a lot faster (in terms of bandwidth speed) when you download a piece of data from a single node, rather than downloading the same piece of data from multiple nodes (especially in case these multiple nodes have not identical bandwidth capacity).

Even in case these multiple nodes have identical bandwidth, there is still a CPU delay when trying to concatenate the data, but this delay is not an issue if you have a decent CPU and the data piece you target is not dispersed among too many nodes.

This is true, but should be solved by:
(a) caching
(b) high frequency applications can directly contact MNs in their subset (avoiding the additional hop)
 
So excellent ideas once again. But sadly there are a few things that you do not know that render this solution not as ideal as you might hope, and as far I can see right now doesn't seem to work.

Let's dive into them. I said this before

Inside the state information is the root app hash, and this root app hash represents the root of the merkleized groveDB structure. This means we will be able to prove anything in the state with a grovedb proof going to the root of the "tree" and the signature signing the root.

It basically means that all data hashes must be on every node so each node can recreate the root app hash. You might think to yourself : that's fine, just store the hashes, and leave the payload empty, and you would be somewhat right. There are some contracts where this might be possible. One difference is that patches to data become impossible. Let's imagine you have a big document, and you want to just change 3 bytes at the end. Instead of just sending a 3 bytes patch, you would need to send the whole document again, and then grovedb would need to know the difference to indexes, so it would ideally need the old document to do a diff? Not sure how this would be possible, getting it from another node would be too expensive.

Next let us think what you are trying to achieve. GroveDB uses secondary indexes based on your data contract. These secondary indexes actually use up a lot of space, and are just hashes. These are not something that could be removed as removing them would influence the state.

At this point I don't think the advantages of not storing the document are really worth it. If we take Dashpay I think the contact request represents less than 50% of the overall data stored. (Would need to verify).

So upsides... small... disadvantages... well not sure it even works, and if it did then you would limit the nodes you could query quite for any data.

Next, quorums rotate. So we would need to build a system where they stay relatively stable in the data they are supposed to host.
 
One difference is that patches to data become impossible. Let's imagine you have a big document, and you want to just change 3 bytes at the end. Instead of just sending a 3 bytes patch, you would need to send the whole document again, and then grovedb would need to know the difference to indexes, so it would ideally need the old document to do a diff? Not sure how this would be possible, getting it from another node would be too expensive.

Next let us think what you are trying to achieve. GroveDB uses secondary indexes based on your data contract. These secondary indexes actually use up a lot of space, and are just hashes. These are not something that could be removed as removing them would influence the state.

And what if you quit GroveDB and choose another Database for DashPlatform?
Why GroveDB suits better for DashPlatform, than any other database?

GroveDB said:
Hierarchical Authenticated Data Structure with Efficient Secondary Index Queries GroveDB is a database system designed specifically for efficient secondary index queries, proofs, speed, and reliability. It was built for use within Dash Platform. Secondary indices are crucial to any database management system. All previous solutions had certain tradeoffs depending on the problem they were trying to solve.
GroveDB is a classic time-space tradeoff. It enables efficient querying on secondary indices by precomputing and committing to them. A subtree of each possible queryable secondary index (up to a cap) is built and committed to into our authenticated data structure.

Why searching on secondary indices is so important to you?
Why searching on EACH POSSIBLE secondary index is so important to you?
What kind of problem you were trying to solve, and you started coding GroveDB, nine months ago?


QuantumExplorer@DashDiscord said:
Data is stored as nodes of data in a merkle-ized provable database the first of its kind, found here: https://github.com/dashpay/grovedb. Masternodes are aware as much as anyone of the contents of data if such data is not encrypted. Platform does have the ability to have users store encrypted data that isn't directly linked to them. However it can always be figured out who wrote whatever blob of encrypted data.

Another alternative is for the agents to scan the content of the platform, delete only whatever unencrypted data they do not like, while keeping the encrypted data in place hoping they will succeed to decypher the data oneday and smash those who posted them.

Nine months ago.....and the GroveDB baby was born........
de48guc-8f8a125c-710f-4c5f-a3e1-ab5df34dd752.jpg
.


Could you release DashPlatform, without having a so highly searchable database, as GroveDB is?
What kind of Database was used in DashPlatform's testbed before the arrival of GroveDB, and for what reasons this database solution was rejected?

Turning a MongoDB Replica Set into a Sharded Cluster | Severalnines
 
Last edited:
And what if you quit GroveDB and choose another Database for DashPlatform?
Why GroveDB suits better for DashPlatform, than any other database?
There is no other database available, GroveDB is the first of its kind using hierarchal authenticated data structures to offer various kinds of proofs.

What kind of problem you were trying to solve, and you started coding GroveDB, nine months ago?
When you make a query to DAPI, you want to do this in a decentralized fashion without the need for a centralized authority. In order to prove that you got the data you wanted and that no other data exists for your query you need secondary indexes. Without them, if you made a request for example: "get me all people who send me a contact request since yesterday", groveDB can prove that the elements returned are both correct and complete. Without secondary indexes the way we built them you could maybe get elements proved to be correct, but not complete, or complete but not correct.

Before GroveDB, it was a mess. There were 4 concurrent databases mimicking the effect we wanted, but that also didn't support proofs. I had been saying that it was impossible for us ever to release without this. I became CTO, hired the people we needed, and built it with their help.
 
There is no other database available, GroveDB is the first of its kind using hierarchal authenticated data structures to offer various kinds of proofs.


When you make a query to DAPI, you want to do this in a decentralized fashion without the need for a centralized authority. In order to prove that you got the data you wanted and that no other data exists for your query you need secondary indexes. Without them, if you made a request for example: "get me all people who send me a contact request since yesterday", groveDB can prove that the elements returned are both correct and complete. Without secondary indexes the way we built them you could maybe get elements proved to be correct, but not complete, or complete but not correct.
Correct, but not complete ? How is it even possible a non complete answer of elements, to be correct? A non complete answer is incorect by definition. Anyway....You mean that you can get correct elements, but not all of them (or by saying "complete but not correct" you mean that you can get previously correct elements that are currently deleted and thus incorrect)...Anyway...

The DashPlatform database is supposed to be replicated and synchronized among all the nodes, isnt it? As long as it is synchronized, then how is it possible (in whatever decentralized node you may ask the querry) to get non complete elements? If you want to do a query in a decentralized fashion without the need for a centralized authority , you just ask several random nodes, and whatever the answer is, this answer is supposed to be identical and complete simply because the synchonization is supposed to work. This is how we retrieve data from the blockchain after all, we connect to some random nodes that are synced and we compare their answer.

You know what? Your problem is created due to the lack of Proof of Bandwidth Service. If a proof of bandwidh service had been implemented, then you would be sure that the nodes were fully synchronized (because the nodes that do not fullfill some minimum bandwidth requirements they will be rejected from the quorum). Now that there is no proof of Bandwidth service, you dont know for sure whether the nodes you query are fully synchronized or not, and you try to solve this with groveDB.

Although I still dont understand how groveDB can solve bandwidth synchronization problems among the quorum members. A low bandwidth quorum member could not even download the indices , which you admited they use a lot of space and thus they may delay to be transmitted. Have you ever tried to test DashPlatform+groveDB by simulating quorum members that have low or irregular bandwidth? Where is your testbed? Are you sleeping on it?

So you gave low priority in the development of a proof of Bandwidth service, and a high priority in the development of a fully, totally, ultimately searchable database (the first of its kind!). Why?
Another alternative is for the agents to scan the content of the platform, delete only whatever unencrypted data they do not like, while keeping the encrypted data in place hoping they will succeed to decypher the data oneday and smash those who posted them.



Nine months ago.................................................................................................................................................
...after draining for 6 years all the money from the stupid masternodes, the GroveDB baby was born.
de48guc-8f8a125c-710f-4c5f-a3e1-ab5df34dd752.jpg
.

 
Last edited:
So excellent ideas once again. But sadly there are a few things that you do not know that render this solution not as ideal as you might hope, and as far I can see right now doesn't seem to work.

Thank you very much for the answer, I appreciate it. I still think the solution is doable, but before I ask any more questions, I should really look at the code now. I do not want to take up any more of your time so don't feel obligated to reply.

What I will look at in the code is how the grovedb is structured in detail with respect to multiple data contracts. There is no such information in the documentation. The example for provable data queries is for one data contract only, not for the entire platform with many independent data contracts. How multiple data contracts are stored and accessed in grovedb is not documented, but has implications for the implementation of my solution.
After looking at the code, here is what I will be interested in next.
I understand that we have a merkelized groveDB structure that is necessary for proving queries, but can we assume that most if not all provable queries refer to individual data contracts and not the entire platform state? Are the data contracts largely independent of each other? What is the use case for provable queries that span multiple data contracts? How common are cross-data contract queries? What is the optimal grovedb structure?

It basically means that all data hashes must be on every node so each node can recreate the root app hash. You might think to yourself : that's fine, just store the hashes, and leave the payload empty, and you would be somewhat right. There are some contracts where this might be possible. One difference is that patches to data become impossible. Let's imagine you have a big document, and you want to just change 3 bytes at the end. Instead of just sending a 3 bytes patch, you would need to send the whole document again, and then grovedb would need to know the difference to indexes, so it would ideally need the old document to do a diff? Not sure how this would be possible, getting it from another node would be too expensive.


Does each data contract have its own subtree with secondary indexes? If so, then "all" data hashes do not need to be on each node. If my node is not in a subset for a DC, then it does not contain any data from that DC, but it also does not store any secondary indexes because it does not need to respond to queries related to that DC. As for updating, that is, patching the big document you mentioned, it belongs to a data contract that is not in my subset (not assigned to me), which means I do not need to patch the document. I simply forward the request (3 bytes) to another node and the node responds with a modified top hash for the subtree of the data contract (I know how to change the rest of the structure in grovedb). Communication is not expensive. I send 3 bytes and get back a hash.

Next, quorums rotate. So we would need to build a system where they stay relatively stable in the data they are supposed to host.

Are not the quorums in Tenderdash only used for consensus?
 
a request for example: "get me all people who send me a contact request since yesterday", groveDB can prove that the elements returned are both correct and complete. Without secondary indexes the way we built them you could maybe get elements proved to be correct, but not complete, or complete but not correct.
So, in GroveDB, there is a primary index on certain primary key, which is basically a selected attribute of a given type. Which one is that in your example? Then there is a secondary index on another attribute type. Which one is that? I understand that an index is a redundant data structure that helps to search fast. It is not immediately obvious however, how an index proves correctness and/or completeness. Can you explain that?
 
So, in GroveDB, there is a primary index on certain primary key, which is basically a selected attribute of a given type. Which one is that in your example? Then there is a secondary index on another attribute type. Which one is that? I understand that an index is a redundant data structure that helps to search fast. It is not immediately obvious however, how an index proves correctness and/or completeness. Can you explain that?

It is not a simple index. It is a merkle tree (also called hash tree).
 
Sharding is cool, but what about ZK rollups?

Zero-Knowledge rollups | ethereum.org

or optimistic rollups?

Optimistic Rollups | ethereum.org

ZK rollups are specific to transactions. As I understand it, it is a Layer 2 solution that batches transactions in a Merkle tree and stores the batch with the root hash in a Layer 1 smart contract. The smart contract verifies the snark proof submitted with a batch before accepting and updating the state of the contract. The main motivation is scaling Ethereum and reducing transaction fees. The savings come from the fact that batch transactions require less data to be stored per transaction (e.g., signatures and some other transaction parameters are no longer needed) and the data is compressed.

In our case, we store documents with generic data according to a data contract that the app developer is free to envision.

Anyway it is unrelated and independent from this proposal. Both solutions can coexist and result in cumulative savings. In my solution, the data, compressed or not, rolled up or whatever, is simply distributed (sharded) based on the desired decentralization level specified by the app developer.
 
there is still a CPU delay when trying to concatenate the data, but this delay is not an issue if you have a decent CPU and the data piece you target is not dispersed among too many nodes.
 
Back
Top