6/20 RC3 Post-mortem

eduffield

Core Developer
Masternode Forking Issues

When we launched the Masternode payment system today, the network exhibited some instability issues similar to those we experienced with the last fork. This new instability was nowhere near as serious as the first implementation, but we erred on the side of caution and disabled Masternode payments for the time being.

Immediately after the network problems were noticed, many users sent debug information from their clients. After a few hours of analysis, we discovered the root cause of the forks.

Two blocks are solved at nearly the same moment on the network, and both are propagated and accepted by the network. In the current implementation, both blocks have the same hash, but in these blocks there's some discrepancy about who to vote for.

In one block the miner votes for 88802 and 88803, in the other the miner abstains from voting. When the next block is solved, it's based on of one of the older blocks, so half of the network believes the miner cheated and rejects the block causing a fork.

Although the network was “pruning” the bad forks as intended, the amount of time it was taking to do so was beyond the confirmation window. This was untenable, so we decided to revert.

The solution to this is straightforward: any changes to the votes must also change the hash of the block, which will prevent the network from thinking these two blocks are the same. Next week we will begin testing code to fix this issue. This will include setting up hundreds of daemons and several more pools on testnet to better simulate mainnet. Barring any new issues, we should be ready to launch in 2-3 weeks.

If interested, here’s the debug information we used to track down the issue : http://pastebin.com/QmbM8dPH

New Developers

We are happy to announce the addition of two developers to the Darkcoin team. David (DRKLord) and Fabian (CHAOSiTEC).

DRKLord brings about 10 years of experience with C, C++, x84/64 ASM and many other skills to the team.

CHAOSiTEC is a long time Darkcoin supporter with extensive knowledge of cryptocurrency and programming.

We’re really excited that a core team is coming together and we’re look forward to seeing what we can accomplish.
 
Yah, but we learned 2 important things, that the rare event (well probably super rare in testnet, as it never happened) where a block is solved by two nodes at the same time can cause a glitch in the voting. And that the system did eventually solve the problem, though it took a little too long. It looks like Evan has a plan on how to fix this. It looks like we have a refinement issue, not a major out of left field issue.

I'll bet that if Evan get's it fixed to his satisfaction, he'll schedule a re-launch with in a week (of feeling confident the issue is solved) so I know a week seems like a long time, but remember, the 3rd time is the charm :tongue: :wink: :grin: :smile:
 
No more launches ok, just stop with the launching stuff...you used up all your launches...stop now.
And no more dates, youre out of dates too..

From now on, how about you do "Live testing" or maybe "active audits"..."security simulation".... Something so that when you find a bug, its a success and not another one of these Epic Crypto Fails.

Im only saying this cause i love you, but yeah, dont put us through another one of these launch date face-smashers. At this point we have to assume the next release will also have problems, so basically im suggesting you build that into the plan.
 
No more launches ok, just stop with the launching stuff...you used up all your launches...stop now.
And no more dates, youre out of dates too..
From now on, how about you do "Live testing" or maybe "active audits"..."security simulation".... Something so that when you find a bug, its a success and not another one of these Epic Crypto Fails.
Im only saying this cause i love you, but yeah, dont put us through another one of these launch date face-smashers. At this point we have to assume the next release will also have problems, so basically im suggesting you build that into the plan.
Be patient. It's new software. If you keep a cool head it's not hard to see that these are just bumps in the road ; the project stays sound.
This experience shakes out speculators with no interest in the coin.
The team is constantly growing. The project is growing. And we are in perfect synch with the upcoming BTC bubble.
Relax. Enjoy. :smile:
 
Masternode Forking Issues

When we launched the Masternode payment system today, the network exhibited some instability issues similar to those we experienced with the last fork. This new instability was nowhere near as serious as the first implementation, but we erred on the side of caution and disabled Masternode payments for the time being.

Immediately after the network problems were noticed, many users sent debug information from their clients. After a few hours of analysis, we discovered the root cause of the forks.

Two blocks are solved at nearly the same moment on the network, and both are propagated and accepted by the network. In the current implementation, both blocks have the same hash, but in these blocks there's some discrepancy about who to vote for.

In one block the miner votes for 88802 and 88803, in the other the miner abstains from voting. When the next block is solved, it's based on of one of the older blocks, so half of the network believes the miner cheated and rejects the block causing a fork.

Although the network was “pruning” the bad forks as intended, the amount of time it was taking to do so was beyond the confirmation window. This was untenable, so we decided to revert.

The solution to this is straightforward: any changes to the votes must also change the hash of the block, which will prevent the network from thinking these two blocks are the same. Next week we will begin testing code to fix this issue. This will include setting up hundreds of daemons and several more pools on testnet to better simulate mainnet. Barring any new issues, we should be ready to launch in 2-3 weeks.

If interested, here’s the debug information we used to track down the issue : http://pastebin.com/QmbM8dPH

New Developers

We are happy to announce the addition of two developers to the Darkcoin team. David (DRKLord) and Fabian (CHAOSiTEC).

DRKLord brings about 10 years of experience with C, C++, x84/64 ASM and many other skills to the team.

CHAOSiTEC is a long time Darkcoin supporter with extensive knowledge of cryptocurrency and programming.

We’re really excited that a core team is coming together and we’re look forward to seeing what we can accomplish.

Thanks for the great analysis!

I've seen the behaviour of the network, having blocks with identical hash but different blocktemplates (votes) in testnet with 70 clients too - but did not draw the right conclusion that this might be a problem. I assumed that this is part of the normal network operation, gathering concensus for the votes. Mainnet proofed me wrong :smile:

And i think you are right: The fix should be quite easy. Remembering the "blockchain in a blockchain" picture we had for the voting system, the state of the inner votechain must influence the hash of the outer blockchain. Quite as easy as that.

Looking forward to get RC3.1 in testnet.

Cheers,
Holger

EDIT: Even now, with testnet having 5~6 masternodes i get different blocktemplates:

Node1:

Code:
    "votes" : [
        "f4590000000000001976a914ecbade2ca5d03f145e3fafc632da3b698053952088ac04000000",
        "f5590000000000001976a91497f4cfe44cd5eb16df9c9d8f01cd22a6c78c8e5888ac03000000",
        "f6590000000000001976a9147f898281b5ca5b7ec9385e04421b4e2a1aa008d488ac02000000",
        "f7590000000000001976a91497f4cfe44cd5eb16df9c9d8f01cd22a6c78c8e5888ac01000000"
    ],
    "payee" : "mxv11kABs5n2aLWHytKgFQvESd2f9vwywL",
    "masternode_payments" : true
}


Node2:

Code:
    "votes" : [
        "f4590000000000001976a914ecbade2ca5d03f145e3fafc632da3b698053952088ac04000000",
        "f5590000000000001976a91497f4cfe44cd5eb16df9c9d8f01cd22a6c78c8e5888ac02000000",
        "f6590000000000001976a9147f898281b5ca5b7ec9385e04421b4e2a1aa008d488ac02000000",
        "f7590000000000001976a91497f4cfe44cd5eb16df9c9d8f01cd22a6c78c8e5888ac01000000"
    ],
    "payee" : "mxv11kABs5n2aLWHytKgFQvESd2f9vwywL",
    "masternode_payments" : true
}

Notice line "f559000..." node 1 has 3 votes, node 2 has 2 votes. So this behaviour can already reproduced in testnet with small amount of masternodes :smile:
 
Last edited by a moderator:
No more launches ok, just stop with the launching stuff...you used up all your launches...stop now.
And no more dates, youre out of dates too..

From now on, how about you do "Live testing" or maybe "active audits"..."security simulation".... Something so that when you find a bug, its a success and not another one of these Epic Crypto Fails.

Im only saying this cause i love you, but yeah, dont put us through another one of these launch date face-smashers. At this point we have to assume the next release will also have problems, so basically im suggesting you build that into the plan.

While I agree that there is room for improvement in how we manage the expectations we set, there is simply no way around issuing launch dates for planned hard forks. Pools, exchanges, and users require ample lead time to update their wallets.
 
While I agree that there is room for improvement in how we manage the expectations we set, there is simply no way around issuing launch dates for planned hard forks. Pools, exchanges, and users require ample lead time to update their wallets.
+1 Everyone needs to have ample time to update.
But of course the update must take into account that there will always be problems caused by people not updating. That will never go away! :wink:
 
It's all about expectations. You announce that you hope to get it right but, as has been seen so far, this is a tricky update and might as well need patching.

One solution would be to do what Vertcoin does with merge mining of their testbed coin. They apply the changes to the merge-mined test coin and thus the mainnet is free from anomalies. The testcoin also has much more real-life correlation than a testnet coin because it is actually mined by real-life clients/pools etc. The merged mined coin is also traded on poloniex btw.
 
Lets roll up our sleeves and get down and dirty back to testnet! Unfortunately, I'll only be able to when I get back Monday to fire up the nodes.
 
While this has been yet another disappointment, it is always good to know the there's light at the end of the tunnel. I just wish that we get there sooner than later.
 
While I agree that there is room for improvement in how we manage the expectations we set, there is simply no way around issuing launch dates for planned hard forks. Pools, exchanges, and users require ample lead time to update their wallets.

Seems like maybe 72 hours is enough time? I know i certainly never got any lead time when the wallets updated in the past. And Derk is correct isnt he? That the hard forks have to deal with people/ pools/ exchanges not updating as they should anyway?

I mean...is the plan to basically just redo this all again in 2-3 weeks?
Let's keep Evans reputation in mind and make arrangements so that failure isnt even possible. Foresnstance; no more launch, how about a "live debugging session" If theres no bugs then great, success! And if we find a bug, great, successful debugging session!

If the team is heart-set on "launching" something launch something that's a 100% lock, A new logo or color scheme, a new startup graphic, i dont care, just a win ffs, no matter how trivial.
 
It's all about expectations. You announce that you hope to get it right but, as has been seen so far, this is a tricky update and might as well need patching.

One solution would be to do what Vertcoin does with merge mining of their testbed coin. They apply the changes to the merge-mined test coin and thus the mainnet is free from anomalies. The testcoin also has much more real-life correlation than a testnet coin because it is actually mined by real-life clients/pools etc. The merged mined coin is also traded on poloniex btw.

Sounds like an excellent idea. Anybody know the difficulties or the downsides of this?
 
Last edited by a moderator:
Thanks for the great analysis!

I've seen the behaviour of the network, having blocks with identical hash but different blocktemplates (votes) in testnet with 70 clients too - but did not draw the right conclusion that this might be a problem. I assumed that this is part of the normal network operation, gathering concensus for the votes. Mainnet proofed me wrong :smile:

And i think you are right: The fix should be quite easy. Remembering the "blockchain in a blockchain" picture we had for the voting system, the state of the inner votechain must influence the hash of the outer blockchain. Quite as easy as that.

Looking forward to get RC3.1 in testnet.

Cheers,
Holger

EDIT: Even now, with testnet having 5~6 masternodes i get different blocktemplates:

Node1:

Code:
    "votes" : [
        "f4590000000000001976a914ecbade2ca5d03f145e3fafc632da3b698053952088ac04000000",
        "f5590000000000001976a91497f4cfe44cd5eb16df9c9d8f01cd22a6c78c8e5888ac03000000",
        "f6590000000000001976a9147f898281b5ca5b7ec9385e04421b4e2a1aa008d488ac02000000",
        "f7590000000000001976a91497f4cfe44cd5eb16df9c9d8f01cd22a6c78c8e5888ac01000000"
    ],
    "payee" : "mxv11kABs5n2aLWHytKgFQvESd2f9vwywL",
    "masternode_payments" : true
}


Node2:

Code:
    "votes" : [
        "f4590000000000001976a914ecbade2ca5d03f145e3fafc632da3b698053952088ac04000000",
        "f5590000000000001976a91497f4cfe44cd5eb16df9c9d8f01cd22a6c78c8e5888ac02000000",
        "f6590000000000001976a9147f898281b5ca5b7ec9385e04421b4e2a1aa008d488ac02000000",
        "f7590000000000001976a91497f4cfe44cd5eb16df9c9d8f01cd22a6c78c8e5888ac01000000"
    ],
    "payee" : "mxv11kABs5n2aLWHytKgFQvESd2f9vwywL",
    "masternode_payments" : true
}

Notice line "f559000..." node 1 has 3 votes, node 2 has 2 votes. So this behaviour can already reproduced in testnet with small amount of masternodes :smile:

But there was no fork in testnet.

I have checked 3 nodes in mainet.
Code:
chaeplin@x60t:~> grep REORGANIZE .darkcoin/debug.log
2014-06-20 16:18:22 REORGANIZE: Disconnect 1 blocks; 00000000000212dcf55b5f098c73f0821880ccc545f6a90794693aa02794c67a..
2014-06-20 16:18:22 REORGANIZE: Connect 2 blocks; ..00000000000022dacba046c9bad97104b4ca97226b90fad6f8ad31d0400d692b
2014-06-20 16:37:04 REORGANIZE: Disconnect 1 blocks; 00000000000022dacba046c9bad97104b4ca97226b90fad6f8ad31d0400d692b..
2014-06-20 16:37:04 REORGANIZE: Connect 2 blocks; ..00000000000d82a5cd6f1bd0314a9ccd2a2d513d2ef6cecd852a454a6c7e2c5b

Code:
mainnet@ip-172-31-30-7:~$ grep REORGANIZE .darkcoin/debug.log
2014-06-20 16:31:02 REORGANIZE: Disconnect 1 blocks; 00000000000022dacba046c9bad97104b4ca97226b90fad6f8ad31d0400d692b..
2014-06-20 16:31:02 REORGANIZE: Connect 2 blocks; ..00000000000d82a5cd6f1bd0314a9ccd2a2d513d2ef6cecd852a454a6c7e2c5b
2014-06-20 17:03:00 REORGANIZE: Disconnect 1 blocks; 00000000001392f1652e9bf45cd8bc79dc60fe935277cd11538565b4a94fa85f..
2014-06-20 17:03:00 REORGANIZE: Connect 2 blocks; ..000000000015426875b0125a3d0e4f0bcea496ef77d83137aac407792b85ebb5

Code:
mainnet@ip-172-31-4-239:~$ grep REORGANIZE .darkcoin/debug.log
2014-06-20 17:03:00 REORGANIZE: Disconnect 1 blocks; 00000000001392f1652e9bf45cd8bc79dc60fe935277cd11538565b4a94fa85f..
2014-06-20 17:03:00 REORGANIZE: Connect 2 blocks; ..000000000015426875b0125a3d0e4f0bcea496ef77d83137aac407792b85ebb5
 
But there was no fork in testnet.
Yeah, maybe not enough distinct miners with diverging masternode lists in testnet - just me, nomp, nomp2 + p2pool. Maybe not sufficient to produce a fork. We need to distribute the hashing power to more miners.

I will patch a client to force diverging masternode/vote candidate lists, setup four of them as miners and put some MHs on them.
 
Last edited by a moderator:
Yeah, maybe not enough distinct miners in testnet - just me, nomp, nomp2 + p2pool. Maybe not be sufficient to produce a fork. We need to distribute the hashing power to more miners.

I will patch a client to force diverging masternode/vote candidate lists, setup four of them as miners and put some MHs on them.
Ah... we have...

I have reindexed 3 testnet nodes.
One of them got this.
Code:
ubuntu@ip-172-31-30-7:~$  grep REORGANIZE .darkcoin/testnet3/debug.log
2014-06-21 09:24:50 REORGANIZE: Disconnect 1 blocks; 00000000fd93e980730f288f1a640fc524b86f5e2fc99005bdf5b3a1d4d7aeea..
2014-06-21 09:24:50 REORGANIZE: Connect 2 blocks; ..000000018a59904ad1e02ae1cfdf30b17fb6f740c574ff31c46ab1e0b754b1d6

SetBestChain: new best=00000000fd93e980730f288f1a640fc524b86f5e2fc99005bdf5b3a1d4d7aeea  height=19451  log2_work=44.693617  tx=33603  date=2014-06-14 17:43:42 progress=0.993643

SetBestChain: new best=000000018a59904ad1e02ae1cfdf30b17fb6f740c574ff31c46ab1e0b754b1d6  height=19453  log2_work=44.693717  tx=33608  date=2014-06-14 17:47:22 progress=0.993646
 
Keep in mind, all pools, all exchanges, all merchants, and whatever need enough time to prepare for forks. Being on the wrong chain is no fun at all. Announcing a fork within only 2-3 days (we had that once accidently) is a huge mess for the network!

I remember that quick fork, THAT fork went smoothly actually...smoothest one of them all perhaps. The forks with the long announcement times have failed the worst no?

Look, Mintpal caused a panic today being not fully prepared and they had plenty of time...like a month! They probably could have done as well with a last minute phone call. So It's not a lead time issue right?

I'm concerned if the plan from here is actually just "Third times a charm!"
 
Back
Top