DarkCoin FPGA Mining Co-op?

Again sorry to the folks that PM'ed me and I haven't responded. I'm going to vomit everything relevant into a single post here instead of replying individually.

It sounds like there are a lot of HW/FPGA centric folks that want to be involved. That's cool. The first thing we should be looking for is a true X11 specification. The web sleuthing I've done so far turned up nothing. I've found things called "specification" but they're just marketing smatch that doesn't help developers in the least. What would be ideal is a document that spells out explicitly the format of the input digest to the hash chain, the exact order of the hashes, which variants of the hashes are used, etc. If anyone knows of such a document, please chime in.

Barring the discovery of such a document though we can use the existing open source Darkcoin SW miner code to reverse engineer these details. This is how this thread started. I was hoping for someone with better SW skills than myself to help with the initial definition (reverse engineering) of the SW implemented hashchain so that we could duplicate it in HW.

So, maybe it would be best to make a "Statement of Work" to say exactly what we're after. If we find an interested SW developer we could show our appreciation by pooling altcoins, giving away FPGA mining HW, beer, weed, percentage of FPGA mined Darkcoins in the future, or some such thing. Here's how I would define the required SW task:

- Start mining Darkcoins using the open source code at https://github.com/elmad/darkcoin-cpuminer-1.3-avx-aes. (or any other open source code that exposes the X11 algorithm within). This will require compiling the source code, creating a Darkcoin wallet to point to, and connecting to a mining pool. This requires no special HW and can be done on any PC. It will be slow on a standard PC, but speed at this stage is irrelevant. This stage is intended for simple data gathering and working on a standard PC is probably optimal.
- Work through the source code and specify the format of the input digest to the hash chain, the exact order of the hashes, which variants of the hashes are used, etc.
- Modify the code to dump data streams of nonces moving through the hashchain. For a given input digest, show all intermediate hash values for a sequence of nonces. Verify the integrity of the modified X11 hashchain by verifying accepted pool shares. This dataset will be important as it can be used to verify the FPGA implementation at a low level. We can split the FPGA work into blocks and use this test data to verify a block at a time.
- Modify the open source getwork scripts currently used on other altcoins to interface to Darkcoin specific FPGA HW. This might be over a serial connection or through an Altera USB Blaster cable. This is likely not as arduous as it sounds because it won't need to be much different from what's out there for other altcoins and it can rely on a proxy_miner type program if necessary to simplify things.

If there exists such a SW developer with the skills to do the above, by all means ... reply to this post. State you demands!
 
Here is the hash variants you want:
blake512->bmw512->groestl512->skein512->jh512->keccak512->luffa512->cubehash512->shavite512->simd512->echo512
By the way, your FPGA guys should start to prepare the 64 bit word hash code, cause many of the open source code is only 32 bit.
 
Are you able to break it down for me a little more. do you need to hash all the 11 algos until you solve each? if so, do you submit 11 nonces'?
 
- Work through the source code and specify the format of the input digest to the hash chain, the exact order of the hashes, which variants of the hashes are used, etc.
- Modify the code to dump data streams of nonces moving through the hashchain. For a given input digest, show all intermediate hash values for a sequence of nonces. Verify the integrity of the modified X11 hashchain by verifying accepted pool shares. This dataset will be important as it can be used to verify the FPGA implementation at a low level. We can split the FPGA work into blocks and use this test data to verify a block at a time.

I worked off the the darkcoin block verification code rather than that miner, as I figured a straightforward implementation would be easier to look at than a highly optimized one.

The data being hashed is the block header(80 bytes total). It is in the following format:
(4 bytes little endian) version
(32 bytes little endian) previous block hash
(32 bytes little endian) merkle root hash
(4 bytes little endian) time
(4 bytes little endian) some value called bits that I'm not sure what it is
(4 bytes little endian) nonce

This chunk of data is fed into blake512. The blake512 hash(just the resulting hash; none of the header is used again) is hashed with bmw512. The same thing continues with the hash resulting from the previous hashing function being hashed by the next algorithm through the whole chain. The chain is blake512 -> bmw512 ->groestl512 -> skein512 -> jh512 -> keccak512 -> luffa512 -> cubehash512 -> shavite512 -> simd512 -> echo512. Only the first 32 bytes(256 bits) of the echo512 hash are used, the 2nd half of the hash is just discarded.

For dumping values for testing, I whipped up a program that takes the header values from a command line, and dumps out each step of the way. Disclaimer: really shitty code, and a lot of stuff copied and pasted from the darkcoin source. https://mega.co.nz/#!fMRVBSIa!VH7eGk9iyg2mPdA-MdAGeTGs5nm8GsFnm7_VpnPgpl4 pool share verification should be unecessary, as one can plug in values from the blockchain to see that it comes out the same as the block hash shown in the block explorer

example:
for block 79695
block explorer(click raw block to see the important values): http://chainz.cryptoid.info/drk/block.dws?79695.htm
program command (bits is taken as integer(i was getting lazy), in case you can't figure out where 453957317 came from):
Code:
x11dump.exe 2 000000000003ef7c942336b52405cb8cba63848e74762f892de100bf645f7a91 9013a9db46bd1872c1b95ee12add669d631d32853fdc80b1643189947ee19828 1401847865 453957317 6893042
and the output is:
Code:
nVersion:       2
hashPrevBlock:  000000000003ef7c942336b52405cb8cba63848e74762f892de100bf645f7a91
hashMerkleRoot: 9013a9db46bd1872c1b95ee12add669d631d32853fdc80b1643189947ee19828
nTime:          1401847865
nBits:          453957317
nNonce:         6893042
Combined for hashing:
02000000917a5f64bf00e12d892f76748e8463ba8ccb0524b53623947cef0300000000002898e17e94893164b180dc3f85321d639d66dd2ae15eb9c17218bd46dba9139039808e53c5d60e1bf22d6900
Hash 1: blake512
input: 02000000917a5f64bf00e12d892f76748e8463ba8ccb0524b53623947cef0300000000002898e17e94893164b180dc3f85321d639d66dd2ae15eb9c17218bd46dba9139039808e53c5d60e1bf22d6900
output: a3d4ca17aefae732402b4a236d0ba5818fb9263cea3ab731d6e0e5ad4338906fd6035fa803931ecc27f66c11b2699e2d0f2da3a3e9cf93f064f6fed0c49ac031

Hash 2: bmw512
input: a3d4ca17aefae732402b4a236d0ba5818fb9263cea3ab731d6e0e5ad4338906fd6035fa803931ecc27f66c11b2699e2d0f2da3a3e9cf93f064f6fed0c49ac031
output: 89c3c3217f1ddda9307773b0f02b317966f2e881b0138417b35cbf74dd67bdec593e3eec98669c4ef05a2b0889179bab174cf16e19b57e64cc20ccd8b4e92a35

Hash 3: groestl512
input: 89c3c3217f1ddda9307773b0f02b317966f2e881b0138417b35cbf74dd67bdec593e3eec98669c4ef05a2b0889179bab174cf16e19b57e64cc20ccd8b4e92a35
output: c5753e3735813ceeb8d6cd566cf482f374ae13b7bd9cf4ad896ba53c726e52c2299bc21b60aa2b7d9dafb35d160031137d0451643f8b96cd2eedbbf7ede2c691

Hash 4: skein512
input: c5753e3735813ceeb8d6cd566cf482f374ae13b7bd9cf4ad896ba53c726e52c2299bc21b60aa2b7d9dafb35d160031137d0451643f8b96cd2eedbbf7ede2c691
output: 3374d75a22434b825e5fe49f0f9615d837b779d6beaef99e2ee18218732be69da97bc14c4373bffe791026684b5203a1cdf4cff3c129bd328e72db34f9f11fc1

Hash 5: jh512
input: 3374d75a22434b825e5fe49f0f9615d837b779d6beaef99e2ee18218732be69da97bc14c4373bffe791026684b5203a1cdf4cff3c129bd328e72db34f9f11fc1
output: 3600ae5de6b0cd7e67ea5f8ccc14b3bdd8794dc315d303aa8b2b2c5547d409b6175e096a8502f2b8072c7750428422b0b74a4e6640149583b89bed7f9bcbab86

Hash 6: keccak512
input: 3600ae5de6b0cd7e67ea5f8ccc14b3bdd8794dc315d303aa8b2b2c5547d409b6175e096a8502f2b8072c7750428422b0b74a4e6640149583b89bed7f9bcbab86
output: 8b292eac29e627290ef3e919373a8f191f5baf5da7e0f4402acdcb7cef37b9ec20c71569eb5b63c5ce2edec9fa5c7b1ebaa687fc6c28bdfbce8d77d23bec1ed7

Hash 7: luffa512
input: 8b292eac29e627290ef3e919373a8f191f5baf5da7e0f4402acdcb7cef37b9ec20c71569eb5b63c5ce2edec9fa5c7b1ebaa687fc6c28bdfbce8d77d23bec1ed7
output: f21851164bb075bc598e3a6587420b606e6906f183a9b94d713e393026a74fa58239adef113b4ce633b1fb2c106b2d713442a27653abfc2d7c738a134f4eedbf

Hash 8: cubehash512
input: f21851164bb075bc598e3a6587420b606e6906f183a9b94d713e393026a74fa58239adef113b4ce633b1fb2c106b2d713442a27653abfc2d7c738a134f4eedbf
output: ea7a9fcdcb5c4fe53ed239b1a468005ba3f4f4a4fd1a12752f6f71cccbda5d06601059d324104a28bc945a9cd2fc690db986e5caeb82676b1f021b593d8c459a

Hash 9: shavite512
input: ea7a9fcdcb5c4fe53ed239b1a468005ba3f4f4a4fd1a12752f6f71cccbda5d06601059d324104a28bc945a9cd2fc690db986e5caeb82676b1f021b593d8c459a
output: e3ec7fb3adc45af9b0ad7e02a55dc39477ccb2b5a15c1fa71fe2c3f499d9ef8037fdc75436c59cddcced300d640b348758b9ad3f941fc7316e997e3df9cb843e

Hash 10: simd512
input: e3ec7fb3adc45af9b0ad7e02a55dc39477ccb2b5a15c1fa71fe2c3f499d9ef8037fdc75436c59cddcced300d640b348758b9ad3f941fc7316e997e3df9cb843e
output: d46905bf6b915d5d88d35b5aee5e448eb658ad1ca9f5904b90fe3abe32355aa072e38b7e7e5721443b88beedf09d23af022adea932b16dbca64e201c8de7f1a6

Hash 11: echo512
input: d46905bf6b915d5d88d35b5aee5e448eb658ad1ca9f5904b90fe3abe32355aa072e38b7e7e5721443b88beedf09d23af022adea932b16dbca64e201c8de7f1a6
output: c347cb8077e2cb7ce01b99d56e91d916588761d510d8352f3c2f01000000000019a3e6d8c882a0be029f08f8c869ad2508ddf67cf19941b6337922ae14f485bb

trimmed:
Hash: 0000000000012f3c2f35d810d561875816d9916ed5991be07ccbe27780cb47c3
The ending hash might not look like the echo512 output at first, but thats due to the endianness. all the input/output parts on each hash dump it one byte at a time, exactly in the order it is in ram, but where the 256 bit hashes are displayed, they're flipped around.
 
Fusecavator thankyou so much. Everything I needed. This means an FPGA could run 11 hashes at the same time, limited by the largest nonce required by the one of the 11 algos. Awesome! Lets start with just getting 1 hash at a time going
 
Thanks much fusecavator =)

With that out of the way, I'm a software dev and total newb to FPGAs but I have a Virtex-5 LX50. Think I can fit X11 on it?

Also, not sure what all I can do to help with this effort but as it progresses I'd like to help.
 
Last edited by a moderator:
bodhi,

Well fusecavator did most everything in the original statement of work. The only thing he didn't do is:

"Modify the open source getwork scripts currently used on other altcoins to interface to Darkcoin specific FPGA HW. This might be over a serial connection or through an Altera USB Blaster cable. This is likely not as arduous as it sounds because it won't need to be much different from what's out there for other altcoins and it can rely on a proxy_miner type program if necessary to simplify things."

If you're comfortable with python and tcl maybe you'd be interested in looking into how to interface an FPGA miner to a mining pool? My hardware uses an Altera USB Blaster interface to communicate between the FPGAs and the PC host. Anyone else on the forum use a different kind of interface we should consider supporting as well? I think the open source projects all support the Blaster interface so that's probably the lowest common denominator. Other interfaces could be added after the fact.

Have a look at the scripts used for https://github.com/kramble/FPGA-Litecoin-Miner and https://github.com/progranism/Open-Source-FPGA-Bitcoin-Miner. They show how to interface the USB Blaster HW to a getwork pool server. We could simply modify these scripts for Darkcoin assuming the getwork interface is similar and then look for a pool with a getwork interface, or look for/create a stratum_proxy to use, or try to get fancy and add stratum support from the start.

Does any of this sound interesting?
 
I would imagine the counters are for keeping track of round iterations, but I haven't looked at Blake much yet. I've looked a little bit at the Skein hash and it has counters to keep track of rounds and to know when to mux in feedback data or new data. I've been caught up in end-of-school-year stuff lately and haven't done much other than peeking at Skein. There's a bit of a disconnect there because the University code for Skein (as well as the 10 other hashes I expect) does a good job of demonstrating the 256 bit implementation, but there aren't a lot of clues how to extrapolate to 512 for X11. There are also some "optional" implementation details for Skein that are unclear as to whether they exist in X11 or not. This is going to take a while.
 
There are also some "optional" implementation details for Skein that are unclear as to whether they exist in X11 or not. This is going to take a while.

The darkcoin code directly uses sphlib ( http://www.saphir2.com/sphlib/ (page isn't loading at the time of writing this, but it was working for me not too long ago, so probably just temp downtime)) for its hashes, so the documentation can likely clear up those issues. There actually is a warning about that on that page:
*************************************************************************
IMPORTANT NOTE: for users of the previous version (sphlib-2.1)
--------------------------------------------------------------
BLAKE, Groestl, JH, Keccak and Skein have been updated, to match the "tweaked" specifications published for the third round of the SHA-3 competition. Thus, these function now return distinct values from what they were producing previously. Also, for Skein with a 224-bit or 256-bit output, the size of the context structure has changed, so calling code must be recompiled as well.
*************************************************************************
I'm guessing darkcoin is using the updated version 3, but I'll compare the source later(don't have sphlib-3 on this comp, and can't dl it when the site is down, but I've got it stored elsewhere)
 
I haven't looked enough into Skien however Blake just ups all words to 64-bit from 32-bit for 512 and 256 respectively
 
It looks like Skein's 512 (as well as all other Skein implementations) is based on repeated 64 bit adder entities. Going to 512 from 256 just doubles the number of adders. The only missing piece then is how to tie in the tweak calc (which remains the same size for all widths) to a wider round width. I'm getting there.
 
atavacron, great find for the hash functions on github. I think the java implementations will translate much more easily to Verilog than C. I spend some time digging through there. Why don't you, Sbatto, and fusecavator put a crypt coin address in your forum signature so we can give you more than just likes for gems like this.
 
Does anyone know what the counters t0 and t1 are in the blake algo? https://131002.net/blake/blake.pdf, Glamorgoblin, have you got any of the algos working? maybe we can work on different algos and combine?

typedef unsigned long sph_u64;
#define SPH_C64(x) ((sph_u64)(x ## UL))
T0 = SPH_C64(0xFFFFFFFFFFFFFC00)
T1 = 0xFFFFFFFFFFFFFFFF

So they are just constants for our purpose here.
Is that what you wanted to know?
 
Last edited by a moderator:
So, I've poked through the X11 hashes enough now to get the feeling that it will take a LARGE FPGA to fit it all in. Even with a large FPGA it will probably take a fair amount of rolling or folding to squeeze everything down. That got me thinking about a "practical" FPGA board architecture for X11. If anyone is developing a custom board for X11 FPGA work consider this approach:

One FPGA sized to fit just two instances of the largest hash machine in the X11 hashchain. Use one of the more recent FPGA's that support dynamic reconfiguration. Attach wide and fast DDR3 or equivalent memory externally. Connect a small microcontroller to the configuration port of the FPGA. Partition the FPGA into two dynamically reconfigurable hash spaces (slots A and B). The first micro programs the A slot with the Blake hash machine and loads the initial header. The Blake machine runs a sequence of nonces through the Blake machine storing the intermediate hashes in the external RAM. It should be able to store 2K hashes in the external RAM. While the A-Blake machine is running the processor programs the BMW machine into slot B. Once RAM is full, the B slot starts processing the hashes in memory and overwritting Blake hashes in external memory with BMW hashes. While BMW is running the processor reconfigures the A slot with the Groestl machine (where Blake used to be). After BMW is finished the A slot overwrites BMW hashes with Groestl hashes and B gets reconfigured with Skein. This continues until all the hashes have executed.

This approach requires approximately 1/6 of the FPGA gates as a full implementation. It would run at about 1/10 the speed of a full single device implementation, but with the exponential price curve of FPGAs could wind up at 1/20 or 1/50 of the cost. You could make multiple instances of this and still come out dollars ahead for an equivalent hash rate.

I'm going to target whatever X11 solution I get to my existing HW, but if anyone is developing custom X11 FPGA HW, please let me know. I'd be very interested in seeing how it goes.
 
Back
Top