我编写了一个python函数,根据一些参数(一系列单词)对网站进行评分。该函数使用Python Mechanize,它在大多数情况下都能正常工作。
然而,对于某些网站,它只是挂在那里直到我在终端上按ctrl + c。我猜这是某种与javascript相关的问题,有没有办法围绕这个建立一个超时功能?
这是我的功能:
def rateSite(site_url,comparisonWords):
#open the site
localBrowser = mechanize.Browser()
localBrowser.addheaders = [('User-agent', 'Mozilla/5.1 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/9.0.1')]
localBrowser.set_handle_robots(False)
site = localBrowser.open(site_url,timeout=5000)
html = site.read()
#rate the site
for i in comparisonWords.split():
#do some rating math
return rating
这是我在ctrl + c上的追溯:
site=localBrowser.open(site_url,timeout=5000)
File "/usr/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 209, in open
return self._mech_open(url, data, timeout=timeout)
File "/usr/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 236, in _mech_open
response = UserAgentBase.open(self, request, data)
File "/usr/lib/python2.7/dist-packages/mechanize/_opener.py", line 202, in open
response = meth(req, response)
File "/usr/lib/python2.7/dist-packages/mechanize/_http.py", line 612, in http_response
"http", request, response, code, msg, hdrs)
File "/usr/lib/python2.7/dist-packages/mechanize/_opener.py", line 219, in error
result = apply(self._call_chain, args)
File "/usr/lib/python2.7/urllib2.py", line 372, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/dist-packages/mechanize/_http.py", line 146, in http_error_302
return self.parent.open(new)
File "/usr/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 209, in open
return self._mech_open(url, data, timeout=timeout)
File "/usr/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 236, in _mech_open
response = UserAgentBase.open(self, request, data)
File "/usr/lib/python2.7/dist-packages/mechanize/_opener.py", line 202, in open
response = meth(req, response)
File "/usr/lib/python2.7/dist-packages/mechanize/_http.py", line 612, in http_response
"http", request, response, code, msg, hdrs)
File "/usr/lib/python2.7/dist-packages/mechanize/_opener.py", line 219, in error
result = apply(self._call_chain, args)
File "/usr/lib/python2.7/urllib2.py", line 372, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/dist-packages/mechanize/_http.py", line 146, in http_error_302
return self.parent.open(new)
File "/usr/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 209, in open
return self._mech_open(url, data, timeout=timeout)
File "/usr/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 236, in _mech_open
response = UserAgentBase.open(self, request, data)
File "/usr/lib/python2.7/dist-packages/mechanize/_opener.py", line 202, in open
response = meth(req, response)
File "/usr/lib/python2.7/dist-packages/mechanize/_http.py", line 612, in http_response
"http", request, response, code, msg, hdrs)
File "/usr/lib/python2.7/dist-packages/mechanize/_opener.py", line 219, in error
result = apply(self._call_chain, args)
File "/usr/lib/python2.7/urllib2.py", line 372, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/dist-packages/mechanize/_http.py", line 146, in http_error_302
return self.parent.open(new)
File "/usr/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 209, in open
return self._mech_open(url, data, timeout=timeout)
File "/usr/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 236, in _mech_open
response = UserAgentBase.open(self, request, data)
File "/usr/lib/python2.7/dist-packages/mechanize/_opener.py", line 202, in open
response = meth(req, response)
File "/usr/lib/python2.7/dist-packages/mechanize/_http.py", line 578, in http_response
self._sleep(pause)
KeyboardInterrupt
任何有关如何解决此问题或为其构建超时的帮助将不胜感激。
谢谢!
答案 0 :(得分:1)
timeout=5000
超过一小时;你的意思可能是timeout=5
。
默认情况下,mechanize
在放弃之前最多会有10个重定向,请参阅HTTPRedirectHandler.max_redirections
。