Python Selenium Firefox在页面停止加载之前获取页面内容

时间:2019-11-25 04:45:55

标签: python selenium-webdriver

我已经找到了一些解决方案,但是没有一个对我有用。我想发生的是,如果driver.get超时,当前页面内容仍然可用。在Facebook和其他一些网站上,JS需要很长时间才能加载。有没有办法做到这一点?

这是我创建驱动程序的方式:

options = Options()
options.headless = True
firefox_profile = webdriver.FirefoxProfile()
firefox_profile.set_preference("http.response.timeout", 10)
firefox_profile.set_preference("dom.max_script_run_time", 10)
driver = webdriver.Firefox(options=options,firefox_profile=firefox_profile)
driver.set_page_load_timeout(10)
driver.get(url)

我希望driver.dom.max_script_run_time会在10秒后停止脚本运行,但似乎没有。我也希望driver.set_page_load_timeout()能满足我的需求。我也曾尝试过发送转义密钥,如另一篇文章中所述,但这只能在发生异常后发生,并且在异常后退出驱动程序。那么,如何在不关闭驱动程序的情况下处理驱动程序中的异常并获取页面内容?

PS。 SO上的另一个问题提出了相同的问题,但由于OP未正确指定问题而被关闭。在那里发布的唯一答案并不能真正回答问题,因为该异常不提供对页面内容的访问。

引发异常:

Traceback (most recent call last):
  File "/home/development/.local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 421, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/home/development/.local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 416, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/lib64/python3.6/http/client.py", line 1346, in getresponse
    response.begin()
  File "/usr/lib64/python3.6/http/client.py", line 307, in begin
    version, status, reason = self._read_status()
  File "/usr/lib64/python3.6/http/client.py", line 268, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/lib64/python3.6/socket.py", line 586, in readinto
    return self._sock.recv_into(b)
socket.timeout: timed out

During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/home/development/Software/scraping/Modules/search_api.py", line 304, in get_other_datasite_response_sel
    self.driver.get(url)
  File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 333, in get
    self.execute(Command.GET, {'url': url})
  File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 319, in execute
    response = self.command_executor.execute(driver_command, params)
  File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/remote/remote_connection.py", line 374, in execute
    return self._request(command_info[0], url, body=data)
  File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/remote/remote_connection.py", line 397, in _request
    resp = self._conn.request(method, url, body=body, headers=headers)
  File "/home/development/.local/lib/python3.6/site-packages/urllib3/request.py", line 80, in request
    method, url, fields=fields, headers=headers, **urlopen_kw
  File "/home/development/.local/lib/python3.6/site-packages/urllib3/request.py", line 171, in request_encode_body
    return self.urlopen(method, url, **extra_kw)
  File "/home/development/.local/lib/python3.6/site-packages/urllib3/poolmanager.py", line 330, in urlopen
    response = conn.urlopen(method, u.request_uri, **kw)
  File "/home/development/.local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 720, in urlopen
    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
  File "/home/development/.local/lib/python3.6/site-packages/urllib3/util/retry.py", line 400, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/home/development/.local/lib/python3.6/site-packages/urllib3/packages/six.py", line 735, in reraise
    raise value
  File "/home/development/.local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 672, in urlopen
    chunked=chunked,
  File "/home/development/.local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 423, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "/home/development/.local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 331, in _raise_timeout
    self, url, "Read timed out. (read timeout=%s)" % timeout_value
urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='127.0.0.1', port=36487): Read timed out. (read timeout=<object object at 0x7fe8605ba120>)

0 个答案:

没有答案