在网站上使用机械化打开和处理很多页面(1000+)时,我遇到了一个奇怪的问题。我时不时地试图加载一个页面而没有超时,这个问题似乎不是特定于页面的,就像我再次运行它并尝试打开它作为魅力的同一页面,但似乎是随意发生。
我正在使用此功能打开页面
def openMechanize(br, url):
while True:
try:
print time.localtime()
print "opening: " + url
resp = br.open(url, timeout = 2.5)
print "done\n"
return resp
except Exception, errormsg:
print repr(errormsg)
print "failed to load page, retrying"
time.sleep(0.5)
当它卡住时,它会产生第一个打印,当前时间和打开网址,但永远不会进入第二个打印。我试图让它运行几个小时但没有任何反应。
当ctrl + c被卡住时中断脚本时,我得到以下输出:
File "test.py", line 143, in openMechanize
resp = br.open(url, timeout = 2.5)
File "/usr/local/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 203, in open
return self._mech_open(url, data, timeout=timeout)
File "/usr/local/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 230, in _mech_open
response = UserAgentBase.open(self, request, data)
File "/usr/local/lib/python2.7/dist-packages/mechanize/_opener.py", line 193, in open
response = urlopen(self, req, data)
File "/usr/local/lib/python2.7/dist-packages/mechanize/_urllib2_fork.py", line 344, in _open
'_open', req)
File "/usr/local/lib/python2.7/dist-packages/mechanize/_urllib2_fork.py", line 332, in _call_chain
result = func(*args)
File "/usr/local/lib/python2.7/dist-packages/mechanize/_urllib2_fork.py", line 1142, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/local/lib/python2.7/dist-packages/mechanize/_urllib2_fork.py", line 1116, in do_open
r = h.getresponse()
File "/usr/lib/python2.7/httplib.py", line 1045, in getresponse
response.begin()
File "/usr/lib/python2.7/httplib.py", line 409, in begin
version, status, reason = self._read_status()
File "/usr/lib/python2.7/httplib.py", line 365, in _read_status
line = self.fp.readline(_MAXLINE + 1)
File "/usr/lib/python2.7/socket.py", line 476, in readline
data = self._sock.recv(self._rbufsize)
KeyboardInterrupt
在检查socket.py时,它会卡住,我看到以下内容:
self._rbuf = StringIO() # reset _rbuf. we consume it via buf.
while True:
try:
data = self._sock.recv(self._rbufsize)
except error, e:
if e.args[0] == EINTR:
continue
raise
由于某些原因导致recv崩溃
,看起来它会陷入无休止的循环中是否有人遇到此错误并找到了某种解决方法?