python卡住装载页面与机械化

时间:2015-01-12 08:43:24

标签: python python-2.7 web-scraping mechanize

在网站上使用机械化打开和处理很多页面(1000+)时,我遇到了一个奇怪的问题。我时不时地试图加载一个页面而没有超时,这个问题似乎不是特定于页面的,就像我再次运行它并尝试打开它作为魅力的同一页面,但似乎是随意发生。

我正在使用此功能打开页面

def openMechanize(br, url):
    while True:
        try:
            print time.localtime()
            print "opening: " + url

            resp = br.open(url, timeout = 2.5)

            print "done\n"

            return resp

        except Exception, errormsg:
            print repr(errormsg)

            print "failed to load page, retrying"
            time.sleep(0.5)

当它卡住时,它会产生第一个打印,当前时间和打开网址,但永远不会进入第二个打印。我试图让它运行几个小时但没有任何反应。

当ctrl + c被卡住时中断脚本时,我得到以下输出:

  File "test.py", line 143, in openMechanize
    resp = br.open(url, timeout = 2.5)
  File "/usr/local/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 203, in open
    return self._mech_open(url, data, timeout=timeout)
  File "/usr/local/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 230, in _mech_open
    response = UserAgentBase.open(self, request, data)
  File "/usr/local/lib/python2.7/dist-packages/mechanize/_opener.py", line 193, in open
    response = urlopen(self, req, data)
  File "/usr/local/lib/python2.7/dist-packages/mechanize/_urllib2_fork.py", line 344, in _open
    '_open', req)
  File "/usr/local/lib/python2.7/dist-packages/mechanize/_urllib2_fork.py", line 332, in _call_chain
    result = func(*args)
  File "/usr/local/lib/python2.7/dist-packages/mechanize/_urllib2_fork.py", line 1142, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/usr/local/lib/python2.7/dist-packages/mechanize/_urllib2_fork.py", line 1116, in do_open
    r = h.getresponse()
  File "/usr/lib/python2.7/httplib.py", line 1045, in getresponse
    response.begin()
  File "/usr/lib/python2.7/httplib.py", line 409, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python2.7/httplib.py", line 365, in _read_status
    line = self.fp.readline(_MAXLINE + 1)
  File "/usr/lib/python2.7/socket.py", line 476, in readline
    data = self._sock.recv(self._rbufsize)
KeyboardInterrupt

在检查socket.py时,它会卡住,我看到以下内容:

        self._rbuf = StringIO()  # reset _rbuf.  we consume it via buf.
        while True:
            try:
                data = self._sock.recv(self._rbufsize)
            except error, e:
                if e.args[0] == EINTR:
                    continue
                raise

由于某些原因导致recv崩溃

,看起来它会陷入无休止的循环中

是否有人遇到此错误并找到了某种解决方法?

0 个答案:

没有答案