Question

情况

我有一个简单的python脚本来获取给定网址的HTML源代码：

    browser = webdriver.PhantomJS()
    browser.get(url)
    content = browser.page_source

有时，网址指向外部资源缓慢加载的网页（例如视频文件或广告内容非常慢）。

Webdriver将在完成.get(url)请求之前等待，直到加载这些资源。

注意：由于无关紧要的原因，我需要使用PhantomJS而不是requests或urllib2

执行此操作

问题

我想在PhantomJS资源加载时设置超时，这样如果资源加载时间太长，浏览器就会假定它不存在或者其他什么。

这将允许我根据浏览器加载的内容执行后续的.pagesource查询。

webdriver.PhantomJS上的

Documentation非常薄，而且我没有在SO上找到类似的问题。

提前谢谢！

Answer 1

下面的长解释， TLDR ：

当前版本的Selenium的Ghostdriver（在PhantomJS 1.9.8中）忽略resourceTimeout选项，使用webdriver的implicitly_wait（），set_page_load_timeout（）并将它们包装在try-except块下。

#Python
from selenium import webdriver
from selenium.common.exceptions import TimeoutException

browser = webdriver.PhantomJS()
browser.implicitly_wait(3)
browser.set_page_load_timeout(3)
try:
    browser.get("http://url_here")
except TimeoutException as e:
    #Handle your exception here
    print(e)
finally:
    browser.quit()

<强>解释

为了向Selenium提供PhantomJS page settings，可以使用webdriver的DesiredCapabilities，例如：

#Python
from selenium import webdriver
cap = webdriver.DesiredCapabilities.PHANTOMJS
cap["phantomjs.page.settings.resourceTimeout"] = 1000
cap["phantomjs.page.settings.loadImages"] = False
cap["phantomjs.page.settings.userAgent"] = "faking it"
browser = webdriver.PhantomJS(desired_capabilities=cap)

//Java
DesiredCapabilities capabilities = DesiredCapabilities.phantomjs();
capabilities.setCapability("phantomjs.page.settings.resourceTimeout", 1000);
capabilities.setCapability("phantomjs.page.settings.loadImages", false);
capabilities.setCapability("phantomjs.page.settings.userAgent", "faking it");
WebDriver webdriver = new PhantomJSDriver(capabilities);

但是，这里有一个问题：就像今天（2014/12月11日）使用PhantomJS 1.9.8及其嵌入式Ghostdriver一样，Ghostdriver（See the Ghostdriver issue#380 in Github）不会应用resourceTimeout。

要获得解决方法，只需使用Selenium的超时函数/方法，并在try-except / try-catch块中包装webdriver的get方法，例如

#Python from selenium import webdriver from selenium.common.exceptions import TimeoutException browser = webdriver.PhantomJS() browser.implicitly_wait(3) browser.set_page_load_timeout(3) try: browser.get("http://url_here") except TimeoutException as e: #Handle your exception here print(e) finally: browser.quit()

//Java WebDriver webdriver = new PhantomJSDriver(); webdriver.manage().timeouts() .pageLoadTimeout(3, TimeUnit.SECONDS) .implicitlyWait(3, TimeUnit.SECONDS); try { webdriver.get("http://url_here"); } catch (org.openqa.selenium.TimeoutException e) { //Handle your exception here System.out.println(e.getMessage()); } finally { webdriver.quit(); }

Answer 2

PhantomJS提供了resourceTimeout，可能符合您的需求。我引用了文档here

（以毫秒为单位）定义任何资源请求之后的超时将停止尝试并继续页面的其他部分。 onResourceTimeout回调将在超时时调用。

所以在Ruby中，你可以做类似

的事情

require 'selenium-webdriver'

capabilities = Selenium::WebDriver::Remote::Capabilities.phantomjs("phantomjs.page.settings.resourceTimeout" => "5000")
driver = Selenium::WebDriver.for :phantomjs, :desired_capabilities => capabilities

我相信Python，它是类似的（未经测试，只提供逻辑，你是Python开发人员，希望你会弄清楚）

driver = webdriver.PhantomJS(desired_capabilities={'phantomjs.page.settings.resourceTimeout': '5000'})

在selenium webdriver.PhantomJS上设置超时

2 个答案: