p2pool memory leak

plambe

Member
Hi,

I installed (many times already) p2pool from chaeplin's github on a debian wheezy (7.4.0) vm. I followed this tutorial: http://www.reddit.com/r/DRKCoin/comments/1zg2c8/tutorial_how_to_set_up_a_darkcoin_p2pool_server/ with the addition of adding a testing repository (only after everything else is ready) and installing glibc 2.18-1.

After doing this for the first time, I noticed that there's a huge memory leak, the memory usage gets higher exponentially. So I reinstalled the debian vm and followed the tutorial from above again.

All the time I was using twisted 13.2 and at some point 14 (it was just released). This doesn't seem to be the issue - one time, on a new debian install I tried using twisted 12 (this is the standard for debian wheezy version) and got an even faster growing memory usage.

A graph can be seen here: http://plambe.ignorelist.com:7905/static/classic/graphs.html?Month

I installed an Ubuntu server 14.04 (although I dislike Ubuntu) and followed the instructions from reddit above, although they are for debian. This configuration is working since yesterday so I can't really say yet whether it will show the same symptoms.

Can you guys give me some ideas what to try next on the debian install?
 
I tried 3 versions of twisted on debian - 12.0.0, 13.2.0 and 14.0.0. Twisted 12.0.0 was noticeably worse than the newer two versions I tried, but I noticed no difference between 13.2.0 and 14.0.0.

I've seen a leak in most of the nodes I checked, however I never saw an exponential growth as with my node - I got 16 GB of RAM full in a few days.

Also, the links you posted are from the standard p2pool, not the drk fork, so in case this was fixed in the original, what files from chaeplin's git repo should I use to make forrestv's version compatible with the darkcoin network?

As far as I understand stuff, twisted is used to manage network connections. Is the leak related to the amount of connections between miners and the p2pool node? Do the amount of connections between browsers and the p2pool node affect this issue?

In case I restart my p2pool node every 24 hours or so, what happens to the fee payment?
 
I have a lot of "Handshake timed out" events from my other p2pool nodes (they have no miners connected, I run them for experimentation), on other VMs:
Code:
2014-05-24 00:02:14.597051 Handshake timed out, disconnecting from 10.0.2.2:48893
2014-05-24 00:04:54.774054 Handshake timed out, disconnecting from 10.0.2.2:35185
2014-05-24 00:06:45.993448 Handshake timed out, disconnecting from 10.0.2.2:33749
2014-05-24 00:07:46.764186 Handshake timed out, disconnecting from 10.0.2.2:36907
2014-05-24 00:08:00.457037 Handshake timed out, disconnecting from 10.0.2.2:48210
2014-05-24 00:08:56.935349 Handshake timed out, disconnecting from 10.0.2.2:45056
2014-05-24 00:09:31.526523 Handshake timed out, disconnecting from 10.0.2.2:40554
2014-05-24 00:10:01.042676 Handshake timed out, disconnecting from 10.0.2.2:41175
2014-05-24 00:10:32.146980 Handshake timed out, disconnecting from 10.0.2.2:44442
2014-05-24 00:11:31.625948 Handshake timed out, disconnecting from 10.0.2.2:45912

10.0.2.2 is the virtual host, through which it seems the p2pool nodes are communicating. I stopped all other p2pool node VMs and I'll see if this changes anything. I suspect this might be the issue as this message is generated right next to the code in commit 0cb07df in forrestv's git repo which is mentioned in issue "Memory leak #88" in the same repo.
 
I think memory leak is caused by miner with "submitted share with ~~~"
I have blocked miner address with that log. then...
p2pool code should be ....

(os 64bit linux, 8G ram)

Code:
-A INPUT -i eth0 -p tcp --dport 7903 -m string --string "XtJ2J8wNPSvWF6aimPt5244wEAhLawcni2"  --algo bm --to 15000 -j REJECT --reject-with tcp-reset
-A INPUT -i eth0 -p tcp --dport 7903 -m string --string "XsZ5vkLj6djy6LsQCxQG42awkks7toAxfP"  --algo bm --to 15000 -j REJECT --reject-with tcp-reset
-A INPUT -i eth0 -p tcp --dport 7903 -m string --string "XdQJdmFq2JbpibaegZcsopvcHsWba1hFWX"  --algo bm --to 15000 -j REJECT --reject-with tcp-reset
-A INPUT -i eth0 -p tcp --dport 7903 -m string --string "XtCE5hM4HxiQgR3EWSR8g3kqtcATWVcfev"  --algo bm --to 15000 -j REJECT --reject-with tcp-reset
-A INPUT -i eth0 -p tcp --dport 7903 -m string --string "XruEd2U8aBU2PbogChJRcCkLEcM9fuCSJi"  --algo bm --to 15000 -j REJECT --reject-with tcp-reset


3TNotyT.png

zbRGqoE.png
 
If you mean messages such as the following:
Code:
xxx:~$ tail -n 100000 p2pool-drk/data/darkcoin/log | grep "submitted share with" -A 2
2014-05-23 21:04:06.968790 Worker XXXXXXXXXXXXX submitted share with hash > target:
2014-05-23 21:04:06.970146     Hash:   b35c4979baeb4d2ff7b904464d749c2fc3097d4199f47dae89c95ea100
2014-05-23 21:04:06.970787     Target: a2ce97d4dc4fa800000000000000000000000000000000000000000000
they are quite rare in my logs, only one mention in the last 100 000 lines of logs.

Anyway, since I stopped the other two VMs with p2pool instances everything seems to be normal. I have no more of the messages like the following:
Code:
2014-05-24 00:11:31.625948 Handshake timed out, disconnecting from 10.0.2.2:45912
and my memory usage is stable for more than an hour now: http://drk.kopame.com:7903/static/classic/graphs.html?Hour

I'll check how things are going tomorrow morning (here it's 3 am).

EDIT: removed "again" from "... everything seems to be normal again" because it never before was normal for me.
 
Last edited by a moderator:
It seems there's still a memory leak, however at least the memory usage increases linearly (not exponentially) and more slowly for the time being. This is the pattern I've seen on other p2pool nodes.
There are no log messages containing "submitted share with" or "Handshake timed out" in the last 100 000 lines.
 
i had to take down my p2pool yesterday due to this issue. i guess we need a fix sooner or later.
 
The memory leak seems to increase much more rapidly with a higher hashrate. Also it seems that with dstorm's repo, the leak has gotten worse.
 
That would explain why my p2pool kept crashing for no reason.
I increased the swapfile to 3gb's. Hopefully by then I'll have a better solution. Maybe a script that automatically kills p2pool and restarts it after hitting a certain amount of memory.
 
I increased the swapfile to 3gb's. Hopefully by then I'll have a better solution. Maybe a script that automatically kills p2pool and restarts it after hitting a certain amount of memory.
Wouldn't the shares from the other p2pools cause it to go back to a high-usage state?

[Edit]:
What if there was a way to block a user after they submit a bad share (thats higher than target), for at least 30 seconds?
 
Wouldn't the shares from the other p2pools cause it to go back to a high-usage state?

[Edit]:
What if there was a way to block a user after they submit a bad share (thats higher than target), for at least 30 seconds?

At the moment I'm banning them manually using this.
Code:
sudo iptables -A INPUT -i eth0 -p tcp --dport 7903 -m string --string "darkcoin address"  --algo bm --to 15000 -j REJECT --reject-with tcp-reset

it would be possible to have a script scan the p2pool log to automatically do the above command.
 
I can write a bash script doing that, but I think that would be an ugly workaround. I wouldn't implement it because the biggest miner at my node is one of the problematic kind - if I get anything from fees, it's mostly from this worker.

A more elaborate workaround brews in my mind - with a very similar iptables command I can redirect the problematic worker's traffic to a stratum proxy that works with x11 p2pool after I patched it (p2pool) a bit, it wasn't stratum-compliant, when a worker subscribes (in this case the proxy), the json reply doesn't have "result: true", but rather "result: null". I don't know if that would work yet, but the idea is that (I think) the stratum proxy checks the shares and only then forwards them to the pool, in this case - my p2pool node, therefore if the memory leak is related to the shares with hash > target, those might get stopped at the proxy, preventing the leak. I am still thinking this through.

How are we certain the memory leak is related to the hash > target issue?
 
Last edited by a moderator:
How are we certain the memory leak is related to the hash > target issue?

Not certain. Memory usage seems to increase dramatically with a higher hash rate though.
Edit: Currently memory usage seems to have capped a little above 2gb's.
 
Last edited by a moderator:
Back
Top