如何扩展我的Twisted服务器以处理数以万计的并发SSL套接字连接?
前几百个客户端连接速度相对较快,但随着计数接近3000,它开始爬行,每秒约2个连接。
我使用以下循环进行负载测试:
clients = []
for i in xrange(connections):
print i
clients.append(
ssl.wrap_socket(
socket.socket(socket.AF_INET, socket.SOCK_STREAM),
ca_certs="server.crt",
cert_reqs=ssl.CERT_REQUIRED
)
)
clients[i].connect(('localhost', 9999))
CPROFILE:
296644049 function calls (296407530 primitive calls) in 3070.656 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.001 0.001 3070.656 3070.656 server.py:7(<module>)
1 0.000 0.000 3070.408 3070.408 server.py:148(main)
1 0.000 0.000 3070.406 3070.406 server.py:106(run)
1 0.000 0.000 3070.405 3070.405 base.py:1190(run)
1 0.047 0.047 3070.404 3070.404 base.py:1195(mainLoop)
34383 0.090 0.000 3070.263 0.089 epollreactor.py:367(doPoll)
38696 0.064 0.000 3066.883 0.079 log.py:75(callWithLogger)
38696 0.077 0.000 3066.797 0.079 log.py:70(callWithContext)
38696 0.035 0.000 3066.598 0.079 context.py:117(callWithContext)
38696 0.056 0.000 3066.556 0.079 context.py:61(callWithContext)
38695 0.093 0.000 3066.486 0.079 posixbase.py:572(_doReadOrWrite)
8599 1249.585 0.145 3019.333 0.351 protocol.py:114(getClientsDict)
37582010 1681.445 0.000 1681.445 0.000 {method 'items' of 'dict' objects}
21496 0.114 0.000 1535.798 0.071 tls.py:346(_flushReceiveBIO)
21496 0.026 0.000 1535.793 0.071 tcp.py:199(doRead)
21496 0.017 0.000 1535.718 0.071 tcp.py:218(_dataReceived)
17197 0.033 0.000 1535.701 0.089 tls.py:400(dataReceived)
8597 0.009 0.000 1531.480 0.178 policies.py:119(dataReceived)
8597 0.078 0.000 1531.471 0.178 protocol.py:65(dataReceived)
4300 0.029 0.000 1525.117 0.355 posixbase.py:242(_disconnectSelectable)
4300 0.030 0.000 1524.922 0.355 tcp.py:283(connectionLost)
4300 0.024 0.000 1524.659 0.355 tls.py:463(connectionLost)
4300 0.010 0.000 1524.492 0.355 policies.py:123(connectionLost)
4300 0.119 0.000 1524.471 0.355 protocol.py:50(connectionLost)
4299 0.027 0.000 1523.698 0.354 tcp.py:270(readConnectionLost)
4299 0.135 0.000 1520.228 0.354 protocol.py:88(handleInitialState)
74840519 31.487 0.000 44.916 0.000 __init__.py:348(__getattr__)
反应堆运行代码:
def run(self):
contextFactory = ssl.DefaultOpenSSLContextFactory(self._key, self._cert)
reactor.listenSSL(self._port, BrakersFactory(), contextFactory)
reactor.run()
答案 0 :(得分:2)
鉴于问题中缺少代码,我将一些人放在一起,看看我是否体验过你所说的效果。从这个实验中,我要说的第一件事是检查并查看脚本运行时机器上的内存利用情况。
我开了一个标准的谷歌云计算系统(1个vCPU,3.8GB ram)(debian backports wheezy,apt-get update; apt-get install python-twisted
)并运行以下(糟糕的黑客)代码:
(注意:要运行此操作,我需要为客户端和服务器shell执行ulimit -n 4096
,否则我将开始“太多打开文件”I.E. Socket accept - "Too many open files")
serv.py
#!/usr/bin/python
from twisted.internet import ssl, reactor
from twisted.internet.protocol import ServerFactory, Protocol
class Echo(Protocol):
def connectionMade(self):
self.factory.clients.append(self)
print "Currently %d open connections.\n" % len(self.factory.clients)
def connectionLost(self, reason):
self.factory.clients.remove(self)
print "Lost connection"
def dataReceived(self, data):
"""As soon as any data is received, write it back."""
self.transport.write(data)
class MyServerFactory(ServerFactory):
protocol = Echo
def __init__(self):
self.clients = []
if __name__ == '__main__':
factory = MyServerFactory()
reactor.listenSSL(8000, factory,
ssl.DefaultOpenSSLContextFactory(
'keys/server.key', 'keys/server.crt'))
reactor.run()
cli.py
#!/usr/bin/python
from twisted.internet import ssl, reactor
from twisted.internet.protocol import ClientFactory, Protocol
class EchoClient(Protocol):
def connectionMade(self):
print "hello, world"
# The following delay is there because as soon as the write
# happens the server will close the connection
reactor.callLater(60, self.transport.write, "hello, world!")
def dataReceived(self, data):
print "Server said:", data
self.transport.loseConnection()
class EchoClientFactory(ClientFactory):
protocol = EchoClient
def __init__(self):
self.stopping = False
def clientConnectionFailed(self, connector, reason):
print "Connection failed - reason ", reason
if not self.stopping:
self.stopping = True
reactor.callLater(10,reactor.stop)
def clientConnectionLost(self, connector, reason):
print "Connection lost - goodbye!"
if not self.stopping:
self.stopping = True
reactor.callLater(10,reactor.stop)
if __name__ == '__main__':
connections = 4000
factory = EchoClientFactory()
for i in xrange(connections):
# the following could certainly be done more elegantly, but I believe
# its a legit use, and given the list in finite, shouldn't be too
# resource intensive of a use... ?
reactor.callLater(i/float(400), reactor.connectSSL,'xx.xx.xx.xx', 8000, factory, ssl.ClientContextFactory())
reactor.run()
一旦运行,并且越过了2544个连接,我的机器严重堵塞,因此很难从中收集数据,但鉴于新的ssh'es以'/ bin / bash返回:无法分配内存',以及当我确实得到我的serv.py有2克res,而客户有1.4克时,我认为可以说我吹了ram。
鉴于上面的代码只是一个快速的黑客,我可能有突出的错误导致内存问题 - 虽然我认为我会提供这个想法,因为导致你的机器交换肯定是导致你的应用程序爬行的好方法。 (也许你和我有同样的错误)
(顺便说一句,聪明的扭曲的人在那里,我欢迎评论我做错了那就是燃烧了这么多公羊)
答案 1 :(得分:1)
我设法确定协议减速的原因。
从上面的cProfile可以看出,大部分的tottime花费在getClientDict()方法中:
296644049 function calls (296407530 primitive calls) in 3070.656 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
8599 1249.585 0.145 3019.333 0.351 protocol.py:114(getClientsDict)
37582010 1681.445 0.000 1681.445 0.000 {method 'items' of 'dict' objects}
以下代码导致此问题:
def getClientsDict(self):
rc = {1: {}, 2: {}}
for r in self.factory._clients[1]:
rc[1] = dict(rc[1].items() +
{r.getDict[1]['id']:
r.getDict[1][
'address']}.items())
for m in self.factory._clients[2]:
rc[2] = dict(rc[2].items() +
{m.getDict[2]['id']:
m.getDict[2][
'address']}.items())
return rc