atop
表明carbon-relay
正在吃80%,90%的USRCPU。来自strace
:
accept(7, {sa_family=AF_INET, sin_port=htons(60649), sin_addr=inet_addr("192.237.222.81")}, [16]) = 257
accept(7, {sa_family=AF_INET, sin_port=htons(51564), sin_addr=inet_addr("166.78.1.48")}, [16]) = 257
accept(7, 0x7ffff4679550, [16]) = -1 EAGAIN (Resource temporarily unavailable)
accept(7, {sa_family=AF_INET, sin_port=htons(33654), sin_addr=inet_addr("198.61.194.248")}, [16]) = 257
accept(7, {sa_family=AF_INET, sin_port=htons(50037), sin_addr=inet_addr("166.78.181.204")}, [16]) = 257
accept(7, 0x7ffff4679550, [16]) = -1 EAGAIN (Resource temporarily unavailable)
奇怪的是:即使重启服务,每次运行strace
时它似乎都停留在fd 7。这是否意味着这个fd没有正确清理?
我增加了打开文件的数量:
/ proc / 2891 / limits
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size 0 unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 15834 15834 processes
Max open files 16384 16384 files
Max locked memory 65536 65536 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 15834 15834 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us
然后它降低到~50%。
我的问题看起来与此thread类似,但由于我们有几个处于TIME_WAIT状态的套接字,我认为启用tw_recycle
无法提供帮助。关于tcp_syncookies
,我在syslog中看不到任何相关消息。
这是我在调试模式下尝试启动carbon-relay
时得到的结果:
26/11/2013 02:22:14 :: [listener] MetricPickleReceiver connection with 50.56.249.127:48772 lost: Connection to the other side was lost in a non-clean fashion: Connection lost.
26/11/2013 02:22:14 :: [listener] MetricPickleReceiver connection with 198.101.241.101:50672 lost: Connection to the other side was lost in a non-clean fashion: Connection lost.
26/11/2013 02:22:14 :: [listener] MetricPickleReceiver connection with 166.78.2.167:43346 lost: Connection to the other side was lost in a non-clean fashion: Connection lost.
这来自twisted
:
class ConnectionLost(ConnectionClosed):
"""Connection to the other side was lost in a non-clean fashion"""
def __str__(self):
s = self.__doc__.strip().splitlines()[0]
if self.args:
s = '%s: %s' % (s, ' '.join(self.args))
s = '%s.' % s
return s
我也试过debug with gdb
,但pystack
没有回复。