DarkCoin FPGA Mining Co-op?

glamorgoblin

New member
Hey,
So I've mined BTC, LTC, DOGE, etc using FPGA HW in the past (I'm an FPGA/Verilog developer by trade). I've only worked on a small scale as a hobby though and was/am not looking to productize anything. Just nerding out on cryptos. All the FPGA mining I've done so far has been SHA256 or Scrypt, but I'm curious if anyone is interested in cooperating on an X11 mining FPGA design. I'm quite comfortable with the Verilog code, a novice with the SW side, and a nube when it comes to the network side. Are there any other crypto-nerds out there that want to tinker?

I have FPGA HW that I would be willing to share to facilitate cooperative development. Ideally I would find a SW-God that could abstract the existing DarkCoin C code to some register friendly psuedo-code and provide some dumps of nonces proceeding though each of the different stages and substages. I could do the target specific Verilog coding and validate simulations against the dumps. I assume we could do something similar to what the open source SHA256 and Scrypt FPGA guys do for interfacing to stratum/getwork.

My current FPGA HW can support 100MH/s for SHA256 mining and (assuming that none of the X11 specific hashes aren't significantly more difficult, or require intensive memory like Scrypt) I would expect a similar hashrate for X11. If the 11 hashes can't fit into a single FPGA fully pipelined, the hashrate might half or quarter to make room for all the logic though.

Just tossing it out there. What say ye?
 
Sounds great! I'm a SW developer but unfortunately I'm not a code god. All this register and pipeline talk is giving me a flashback to my assembly lab and I'm breaking out in a cold sweat. :tongue:
 
LZ,

Hey, maybe it's a good match then because every time I've tried to look through the C code I get flash backs to my CE classes and break into a raging fit of apathy. Have you looked at the code over at https://github.com/ig0tik3d/darkcoin-cpuminer-1.2c? That's what I've been looking at until my eyes glaze over. I always get lost in code that has been written with "Good" coding style. If it were sloppy and hacky (the way I fumble through SW efforts) I'd probably be able to grasp in more easily. All the different levels of abstraction, data types, linking of a gazillion different files types ... sigh.

What REALLY helped when I did the Scrypt FPGA design was the document at https://tools.ietf.org/html/draft-josefsson-scrypt-kdf-01. That showed the entire algorithm in generic pseudo code and provided some example data streams of hashes progressing through the core. It was platform agnostic, simplified, and reasonably portable to Verilog. It seems something like that could be created for X11 by tinkering with the github code. Am I oversimplifying?
 
I've received a few PM's about this post, so I thought I'd give a status update here instead of replying individually.

During the evaluation of the candidates for SHA-3, evidently lots of universities developed generic FPGA code for each of the X11 hashes to benchmark their performance in real hardware. Since all 11 hashes in X11 were considered for the competition, they're all out there and can be found at https://www.rcis.aist.go.jp/special/SASEBO/SHA3-en.html. This is exactly what I was looking for and I think I can get what I need from there. It would still be nice to have a dump of nonces/digests progressing through the DarkCoin X11 specific hash chain so I can doublecheck my work as I go. There are some implementation specific details of the hashes that I still have to figure out.

If you understand what I'm asking for when I say "a dump of the nonces/digests progressing through the DarkCoin X11 specific hash chain" and are interested in a cooperative effort where we both potentially wind up with X11 FPGA miners in the end please let me know. If the above statement doesn't make sense though, the chances of a meaningful partnership aren't very good.
 
Well my current mining rig is a sea of Altera EP2S90's. These are pretty old parts (Stratix II), and are really expensive compared to more modern FPGA's of similar (or even far superior) performance. So, I'll leave it to the reader to do the math related to cost if a more appropriate part were used. I have all the FPGA horsepower I want and will simply retarget my current HW from scrypt mining to X11 mining when/if a solution is found, so I haven't done much thinking about optimization of cost/power/speed. My HW architecture is already set in stone so these characteristics will just be whatever they will be.
 
Any idea what FPGA might be large enough to support X11?
I just added the required slice for each algorithm and the sum is slightly larger than the total slice number of spartan 6 used in the popular miners.
However, considering normally only 80% slice could be used, it could be even worse, therefore I am wondering to combine two spartan on Icarus together.
The good thing to reuse Icarus is that we have many available resource in hand, but the bad thing is that I do not have Icarus or Lancelot. It should be noted that many other FPGA miners did not have interconnect pins between the FPGAs.
 
Other than a dual Spartan 6 board what single FPGA would be large enough, maybe a Virtex 5 or 6? By the way, how many slices did you calculate that the X11 algo would need?
 
The FPGA resource utilization numbers from the universities are for flat implementations. What you can see from Bitcoin (SHA256) FPGA coding though is that there are techniques to trade off speed for gates in an FPGA. If any part of any hash algorithm repeats the same type of operation at multiple points, the duplicate logic can be eliminated if the hash rate is slowed down enough to allow sharing of a single instance of the common logic. I haven't looked far enough into the university code yet to see how much opportunity there is for this type of technique, but I'd be surprised if the design can't be maneuvered into a reasonably sized FPGA with enough work in this area. That's why I listed a 100MH/s rate with the possibility of halving or quartering. Halving will result from sharing common logic across 2 paths. Quartering will result from sharing common logic across 4 paths.

Interconnected FPGAs would work, but it would be difficult to find a place in the hash chain to break up the design without drastically affecting the overall hashrate. The intermediate hashes are likely 1Kb each or larger, so even if the FPGA interconnect can somehow support 1Gb/s (rather unlikely), this still limits the overall hashrate to 1MH/s if the interconnect is fully saturated. We're still far better off using the common-logic technique described above than stretching X11 across board interconnects.
 
Oh, OK. I think I've got it know. Thanks for the explanation. The more I study FPGAs the more fascinating I find them.
 
One more thing is the unrolled pipeline, and this is used to accelerate the speed at the cost of resource, right?
We have the resource problem because we want to implement 11 algorithms in one chip, therefore optimization is required or just get a better FPGA.
 
Last edited by a moderator:
I hope this becomes a serious approach. How about
- A crowd funded or shareholder financed open source project for an X11 mining box, under the DRK umbrella.
- Running on off the shelf, freely available FPGA and control hardware. I have close to no knowledge about FPGAs.
Which boards suit best, this ok? http://www.opalkelly.com/products/xem7350/
Controlled by a RaspPi?
What hash rates could be achieved?
 
The SHA256 FPGA designs (bitcoin) use the concept of "unrolled" and "pipelined" logic as well. That's their terminology for the design approaches that don't share any common logic anywhere. So no part of the process has to wait for shared logic to become available and the entire machine can run as fast as possible. Folks with smaller FPGAs can still mine, just at lower hash rates. While those with big FPGAs can implement the fully unrolled design and get the most bang for their buck. It looks like there are examples of this in the SHA-3 candidate code from the universities as well. Although they might be calling it "folding" rather than "rolling".
 
That make sense to me. If you don't have to wait for a logic block to become "not busy" then your speed will be much quicker.
So if there is unused space on the FPGA, one can place multiple pipelines for parallel processing? How do you determine how many logic blocks or pipelines can fit on a given FPGA? I would suppose it's more of an art form because you can multiple ways of doing the same logic yet the size could be different.
 
Sizing into an FPGA is REALLY tough. It requires intimate knowledge of the hash function to know what pieces can be shared without corrupting the results. Kramble's Scrypt code on github is a great example of shared logic where more than a dozen passes through the SHA-256 function are required according to the scrypt definition, but there only exists one SHA-256 logic block in the design that gets used over and over. There is a rather sophisticated control mechanism required to manage it all and keep everything straight, but as far as resource utilization is concerned it uses very few gates and very little power.

The "pipeline" concept isn't really a parallel logic approach as you've described. A "fully pipelined" design has organized the entire function into logical steps without any interdependence between steps. So even though it may take 4 steps (for example) to execute the function for any given nonce, at any given time there are 4 nonces in the "pipeline". When nonce1 completes step1, it moves on to step2 and nonce2 begins step1. This results in a machine where the number of required steps is irrelevant. The output is simply a constant stream of hashes since a new nonce is always just completing the last step in the process.
 
Back
Top