我正在运行Phontomjs和Selenium的python脚本。我正面临超时问题。它在20-50分钟后停止。我需要一个解决方案,以便我可以运行我的脚本没有这个超时问题。问题在哪里,我该如何解决?
The input file cannot be read or no in proper format.
Traceback (most recent call last):
File "links_crawler.py", line 147, in <module>
crawler.Run()
File "links_crawler.py", line 71, in Run
self.checkForNextPages()
File "links_crawler.py", line 104, in checkForNextPages
self.next.click()
File "/home/dev/.local/lib/python2.7/site-packages/selenium/webdriver/remote/webelement.py", line 75, in click
self._execute(Command.CLICK_ELEMENT)
File "/home/dev/.local/lib/python2.7/site-packages/selenium/webdriver/remote/webelement.py", line 454, in _execute
return self._parent.execute(command, params)
File "/home/dev/.local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 199, in execute
response = self.command_executor.execute(driver_command, params)
File "/home/dev/.local/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 395, in execute
return self._request(command_info[0], url, body=data)
File "/home/dev/.local/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 463, in _request
resp = opener.open(request, timeout=self._timeout)
File "/usr/lib/python2.7/urllib2.py", line 431, in open
response = self._open(req, data)
File "/usr/lib/python2.7/urllib2.py", line 449, in _open
'_open', req)
File "/usr/lib/python2.7/urllib2.py", line 409, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 1227, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib/python2.7/urllib2.py", line 1200, in do_open
r = h.getresponse(buffering=True)
File "/usr/lib/python2.7/httplib.py", line 1127, in getresponse
response.begin()
File "/usr/lib/python2.7/httplib.py", line 453, in begin
version, status, reason = self._read_status()
File "/usr/lib/python2.7/httplib.py", line 417, in _read_status
raise BadStatusLine(line)
httplib.BadStatusLine: ''
代码:
class Crawler():
def __init__(self,where_to_save, verbose = 0):
self.link_to_explore = ''
self.TAG_RE = re.compile(r'<[^>]+>')
self.TAG_SCRIPT = re.compile(r'<(script).*?</\1>(?s)')
if verbose == 1:
self.driver = webdriver.Firefox()
else:
self.driver = webdriver.PhantomJS()
self.links = []
self.next = True
self.where_to_save = where_to_save
self.logs = self.where_to_save + "/logs"
self.outputs = self.where_to_save + "/outputs"
self.logfile = ''
self.rnd = 0
try:
os.stat(self.logs)
except:
os.makedirs(self.logs)
try:
os.stat(self.outputs)
except:
os.makedirs(self.outputs)
try:
fin = open(file_to_read,"r")
FileContent = fin.read()
fin.close()
crawler =Crawler(where_to_save)
data = FileContent.split("\n")
for info in data:
if info!="":
to_process = info.split("|")
link = to_process[0].strip()
category = to_process[1].strip().replace(' ','_')
print "Processing the link: " + link : " + info
crawler.Init(link,category)
crawler.Run()
crawler.End()
crawler.closeSpider()
except:
print "The input file cannot be read or no in proper format."
raise
答案 0 :(得分:0)
如果您不希望Timeout停止脚本,则可以捕获异常
public class DoctorTestWorkRequest extends WorkRequestNew
并传递它。
您可以使用selenium.common.exceptions.TimeoutException
set_page_load_timeout()
方法设置默认页面加载超时。
喜欢这个
webdriver
如果您的页面在10秒内没有加载,则会抛出TimeoutException。
编辑: 忘记提到你必须把你的代码放在一个循环中。
添加导入
driver.set_page_load_timeout(10)
from selenium.common.exceptions import TimeoutException