DarkCoin FPGA Mining Co-op?

I have one of those SoC fpga devices which at first seems like it could be a fit for your plan to reprogram the fpga for every hash. However I think you need to redo some of the math here. The board I have has a 32bit 400/800 DDR interface. Which comes out to 25.6gbit/s. Since the result of each hash is 512 bits. This gives you roughly 50M hashes into and out of memory. Each hash algorithm will have to read the previous hash and then write it's output, halving the bandwidth. So 25M raw hashes. Divided by 11 algorithms this gives best case performance of 2.3 MH/s. Aka quite a bit less than a 750ti. This also puts into question the idea of fully unrolling the hashes since 25MH/s/algo should only require 2/4 unrolls. Obviously there is a lot to be gained by fitting multiple hashes into the chip at one time and avoiding the memory.

I have only looked into Blake and BMW so far. They are both Merkle-Damgard wide hash constructions. If all the other hashes were the same. It should be possible to maybe make some kind of construction where the inner loops of the hash use a type of automata so that resources like these huge numbers of 64bit adders can be reused for different algorithms. In such a case maybe all the hashes could fit in a single "generic hash" module. If this module could run > 100Mhz then total performance could be as high as 10MH. Which is like a lot better imho than what you get with memory.
 
Last edited by a moderator:
I have one of those SoC fpga devices which at first seems like it could be a fit for your plan to reprogram the fpga for every hash. However I think you need to redo some of the math here. The board I have has a 32bit 400/800 DDR interface. Which comes out to 25.6gbit/s. Since the result of each hash is 512 bits. This gives you roughly 50M hashes into and out of memory. Each hash algorithm will have to read the previous hash and then write it's output, halving the bandwidth. So 25M raw hashes. Divided by 11 algorithms this gives best case performance of 2.3 MH/s. Aka quite a bit less than a 750ti. This also puts into question the idea of fully unrolling the hashes since 25MH/s/algo should only require 2/4 unrolls. Obviously there is a lot to be gained by fitting multiple hashes into the chip at one time and avoiding the memory.

I have only looked into Blake and BMW so far. They are both Merkle-Damgard wide hash constructions. If all the other hashes were the same. It should be possible to maybe make some kind of construction where the inner loops of the hash use a type of automata so that resources like these huge numbers of 64bit adders can be reused for different algorithms. In such a case maybe all the hashes could fit in a single "generic hash" module. If this module could run > 100Mhz then total performance could be as high as 10MH. Which is like a lot better imho than what you get with memory.

So, you're saying that it`s possible to get a whole 10 mh out of my old spartan?
 
My chip is a cyclone V with 45k ALM's, 166k registers and 5570 300Mhz BlockRAM's. It still isn't clear that my design is small enough or that you could fit enough of them inside. You would need a very large spartan-6 to even think about it. Essentially what I am proposing is a type of CPU. This processor will have a fixed program memory, no branching capabilities and i think no memory access. Just operations on a hybrid memory/register file. Instructions will be optimized for hashing. So for example on keccak, we could have an instruction that calculates parity on 5 register locations in parallel.
Having a lot of logic sitting around that is not used every cycle can end up creating a lot of waste. So in the case where there is enough space it might be best to have a different CPU for each algorithm. In which case it starts to sound a lot like the usual implementation of a hash. However I think that it is not. None of these FPGA hash implementation make use of features like block ram. They require much more register usage to achieve the same amount of pipelining.
 
hi there,
I'm new here as you can see, and am very interested by the dev of an x11 fpga, but since i'm only a soft dev, i cant really help in my "state" . Frome where do you think a i should start to get into it ? Is there some specifications/documentation about x11 ? mining softs ?
 
From what Understood, it's possible to reprogram the spartan 6 to hash x11, but it's not worth it? Sigh.... Well, if anyone is in need of an old fpga, hit me up. Otherwise, I'm putting my old miner back into storage to collect dust, lol.
 
I just stumbled onto this thread. Very interesting indeed, I'll be watching closely. Most way over my head, but I find it very interesting.
 
I've got a nexys 2 board with spartan 3e-500.
It has been a while since I did some xilinx work, like 3 years ago, mostly OOP software dev now.
 
Sooo, anyone have any progress on an x11 FPGA??
It was determined that the sha3 candidates are too complicated for affordable fpgas. Even single hash pow functions using sha3 candidates can't be sufficiently unrolled to produce high enough hash rates. Implementing all 11 used in x11 would require a massive fpga, and the results would be poor. IIRC testing showed that skein running on a stratix board only managed to produce about 10% of the hashrate of a radeon 7950, so there really isn't any profit to be made unless you're already sitting on a large fpga farm of very expensive boards.
 
It was determined that the sha3 candidates are too complicated for affordable fpgas. Even single hash pow functions using sha3 candidates can't be sufficiently unrolled to produce high enough hash rates. Implementing all 11 used in x11 would require a massive fpga, and the results would be poor. IIRC testing showed that skein running on a stratix board only managed to produce about 10% of the hashrate of a radeon 7950, so there really isn't any profit to be made unless you're already sitting on a large fpga farm of very expensive boards.
Mmmmh... I've talked to an FPGA dev who told me the opposite. You can solve each of the 11 algos in a separate chip/core and parallize the work. But I've no idea if thats worth the try if you only want FPGAs. I think he was working on ASIC designs.
 
Is not it possible to interconnect the 4 Spartan6 XC6SLX150 FPGAs designed by the ZTEX?

usb-fpga-1.15y2-400.jpg
 
Last edited by a moderator:
Mmmmh... I've talked to an FPGA dev who told me the opposite. You can solve each of the 11 algos in a separate chip/core and parallize the work. But I've no idea if thats worth the try if you only want FPGAs. I think he was working on ASIC designs.
Space is at a premium on fpgas. Unlike gpus, which process the same instruction on many cores at a time, each part of the fpga only operates on one thing at a time, essentially having a stream of data going through the code, so to get good performance, code has to be unrolled when possible, since otherwise it will hold up the next data while one part goes through the same section multiple times. The 11 algos would be seperate sections, and would run in parallel like you were told, the problem is fitting them unrolled enough to get decent performance. You end up having to make major performance/size tradeoffs, and the performance goes to shit. There might be high-end fpgas that could fit it, but they would be extremely costly.

I don't know very much about asic designs, but I'm guessing you have more flexibility regarding code size, and also your cost per chip would be much lower, and you'd draw less electricity, so those problems likely wouldn't affect asics as much.
 
Space is at a premium on fpgas. Unlike gpus, which process the same instruction on many cores at a time, each part of the fpga only operates on one thing at a time, essentially having a stream of data going through the code, so to get good performance, code has to be unrolled when possible, since otherwise it will hold up the next data while one part goes through the same section multiple times. The 11 algos would be seperate sections, and would run in parallel like you were told, the problem is fitting them unrolled enough to get decent performance. You end up having to make major performance/size tradeoffs, and the performance goes to shit. There might be high-end fpgas that could fit it, but they would be extremely costly.

I don't know very much about asic designs, but I'm guessing you have more flexibility regarding code size, and also your cost per chip would be much lower, and you'd draw less electricity, so those problems likely wouldn't affect asics as much.

Really is not possible to mount a cluster with 11 FPGAs in parallel each dedicated to one algorithm?
 
Really is not possible to mount a cluster with 11 FPGAs in parallel each dedicated to one algorithm?
Space isn't just a problem for fitting all the algorithms on one board. Even fitting the individual algorithms(512 bit sha3 candidates) alone on boards get poor performance as they can't be unrolled enough and still fit on affordable boards.
 
The more I think about this, the more I'm convinced the Scrypt ASIC manufacturers are hiring shills and keeping the price of LTC up, until they can move all their hardware...
 
Last edited by a moderator:
The more I think about this, the more I'm convinced the Scrypt ASIC manufacturers are hiring shills and keeping the price of LTC up, until they can move all their hardware...

How ASIC manufacturers work:


  1. Announce fast ASIC
  2. Collect pre-orders
  3. When there's enough money from the pre-orders hire a Chinese ASIC company to develop that thing [*]
  4. When development is finished and units are available mine with them until the difficulty is too high.
  5. Tell the people who have pre-ordered that's there a technical problem, so delivery is postponed
  6. Goto 4.) until people threat to sue you
  7. Deliver or file bankruptcy
  8. Goto 1.
[*] Once the units are finished the Chinese ASIC company does steps 4. - 6. internally until the manufacturer threatens to sue them
 
I agree with @crowing I never had a single piece of mining hardware that ever paid off except for GPUs.

CPUs and GPUs were great because they never really lose value within a few month so hard like ASICs do.
 
Back
Top