恼人的Twisted Python问题

时间:2010-04-14 01:15:33

标签: python twisted reactor

我试图出于个人兴趣回答以下问题: What is the fastest way to send 100,000 HTTP requests in Python?

这是我到目前为止所提出的,但我正在经历一些非常困难的事情。

installSignalHandlers True 时,它会挂起。我可以看到DelayedCall个实例位于reactor._newTimedCalls,但processResponse永远不会被调用。

installSignalHandlers False 时,会抛出错误并正常工作。

from twisted.internet import reactor
from twisted.web.client import Agent
from threading import Semaphore, Thread
import time

concurrent = 100
s = Semaphore(concurrent)
reactor.suggestThreadPoolSize(concurrent)
t=Thread(
    target=reactor.run,
    kwargs={'installSignalHandlers':True})
t.daemon=True
t.start()


agent = Agent(reactor)


def processResponse(response,url):
    print response.code, url
    s.release()

def processError(response,url):
    print "error", url
    s.release()

def addTask(url):
    req = agent.request('HEAD', url)
    req.addCallback(processResponse, url)
    req.addErrback(processError, url)


for url in open('urllist.txt'):
    addTask(url.strip())    
    s.acquire()
while s._Semaphore__value!=concurrent:
    time.sleep(0.1)     

reactor.stop()

以下是installSignalHandlers为True时抛出的错误: (注意:这是预期的行为!问题是当installSignalHandlers为False时它不起作用。)

Traceback (most recent call last):
  File "/usr/lib/python2.6/dist-packages/twisted/internet/base.py", line 396, in fireEvent
    DeferredList(beforeResults).addCallback(self._continueFiring)
  File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 224, in addCallback
    callbackKeywords=kw)
  File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 213, in addCallbacks
    self._runCallbacks()
  File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 371, in _runCallbacks
    self.result = callback(self.result, *args, **kw)
--- <exception caught here> ---
  File "/usr/lib/python2.6/dist-packages/twisted/internet/base.py", line 409, in _continueFiring
    callable(*args, **kwargs)
  File "/usr/lib/python2.6/dist-packages/twisted/internet/base.py", line 1165, in _reallyStartRunning
    self._handleSignals()
  File "/usr/lib/python2.6/dist-packages/twisted/internet/base.py", line 1105, in _handleSignals
    signal.signal(signal.SIGINT, self.sigInt)
exceptions.ValueError: signal only works in main thread

我做错了什么,正确的方法是什么?我很擅长扭曲。

@moshez: 谢谢。它现在有效:

from twisted.internet import reactor, threads
from urlparse import urlparse
import httplib
import itertools


concurrent = 100
finished=itertools.count(1)
reactor.suggestThreadPoolSize(concurrent)

def getStatus(ourl):
    url = urlparse(ourl)
    conn = httplib.HTTPConnection(url.netloc)   
    conn.request("HEAD", url.path)
    res = conn.getresponse()
    return res.status

def processResponse(response,url):
    print response, url
    processedOne()

def processError(error,url):
    print "error", url#, error
    processedOne()

def processedOne():
    if finished.next()==added:
        reactor.stop()

def addTask(url):
    req = threads.deferToThread(getStatus, url)
    req.addCallback(processResponse, url)
    req.addErrback(processError, url)   

added=0
for url in open('urllist.txt'):
    added+=1
    addTask(url.strip())

try:
    reactor.run()
except KeyboardInterrupt:
    reactor.stop()

1 个答案:

答案 0 :(得分:6)

你正在使用waaaaay过多的“反应堆调用”(例如,agent.request很可能从主线程中调用反应堆)。我不确定这是不是你的问题,但它仍然不受支持 - 来自非反应堆线程的唯一反应堆调用是reactor.callFromThread。

此外,整个架构似乎很奇怪。你为什么不在主线程上运行反应堆?读取包含10,000个请求的整个文件并拆分它们应该不会成为反应堆的问题,即使您一次完成所有操作也是如此。

您可以使用不使用任何线程的纯Twisted解决方案。