我试图出于个人兴趣回答以下问题: What is the fastest way to send 100,000 HTTP requests in Python?
这是我到目前为止所提出的,但我正在经历一些非常困难的事情。
installSignalHandlers True 时,它会挂起。我可以看到DelayedCall
个实例位于reactor._newTimedCalls
,但processResponse
永远不会被调用。
installSignalHandlers False 时,会抛出错误并正常工作。
from twisted.internet import reactor
from twisted.web.client import Agent
from threading import Semaphore, Thread
import time
concurrent = 100
s = Semaphore(concurrent)
reactor.suggestThreadPoolSize(concurrent)
t=Thread(
target=reactor.run,
kwargs={'installSignalHandlers':True})
t.daemon=True
t.start()
agent = Agent(reactor)
def processResponse(response,url):
print response.code, url
s.release()
def processError(response,url):
print "error", url
s.release()
def addTask(url):
req = agent.request('HEAD', url)
req.addCallback(processResponse, url)
req.addErrback(processError, url)
for url in open('urllist.txt'):
addTask(url.strip())
s.acquire()
while s._Semaphore__value!=concurrent:
time.sleep(0.1)
reactor.stop()
以下是installSignalHandlers为True时抛出的错误: (注意:这是预期的行为!问题是当installSignalHandlers为False时它不起作用。)
Traceback (most recent call last):
File "/usr/lib/python2.6/dist-packages/twisted/internet/base.py", line 396, in fireEvent
DeferredList(beforeResults).addCallback(self._continueFiring)
File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 224, in addCallback
callbackKeywords=kw)
File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 213, in addCallbacks
self._runCallbacks()
File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 371, in _runCallbacks
self.result = callback(self.result, *args, **kw)
--- <exception caught here> ---
File "/usr/lib/python2.6/dist-packages/twisted/internet/base.py", line 409, in _continueFiring
callable(*args, **kwargs)
File "/usr/lib/python2.6/dist-packages/twisted/internet/base.py", line 1165, in _reallyStartRunning
self._handleSignals()
File "/usr/lib/python2.6/dist-packages/twisted/internet/base.py", line 1105, in _handleSignals
signal.signal(signal.SIGINT, self.sigInt)
exceptions.ValueError: signal only works in main thread
我做错了什么,正确的方法是什么?我很擅长扭曲。
@moshez: 谢谢。它现在有效:
from twisted.internet import reactor, threads
from urlparse import urlparse
import httplib
import itertools
concurrent = 100
finished=itertools.count(1)
reactor.suggestThreadPoolSize(concurrent)
def getStatus(ourl):
url = urlparse(ourl)
conn = httplib.HTTPConnection(url.netloc)
conn.request("HEAD", url.path)
res = conn.getresponse()
return res.status
def processResponse(response,url):
print response, url
processedOne()
def processError(error,url):
print "error", url#, error
processedOne()
def processedOne():
if finished.next()==added:
reactor.stop()
def addTask(url):
req = threads.deferToThread(getStatus, url)
req.addCallback(processResponse, url)
req.addErrback(processError, url)
added=0
for url in open('urllist.txt'):
added+=1
addTask(url.strip())
try:
reactor.run()
except KeyboardInterrupt:
reactor.stop()
答案 0 :(得分:6)
你正在使用waaaaay过多的“反应堆调用”(例如,agent.request很可能从主线程中调用反应堆)。我不确定这是不是你的问题,但它仍然不受支持 - 来自非反应堆线程的唯一反应堆调用是reactor.callFromThread。
此外,整个架构似乎很奇怪。你为什么不在主线程上运行反应堆?读取包含10,000个请求的整个文件并拆分它们应该不会成为反应堆的问题,即使您一次完成所有操作也是如此。
您可以使用不使用任何线程的纯Twisted解决方案。