我们可以使用Selenium获取页面源,即使它没有完全加载(TimeoutException:Message:timeout)?

时间:2017-07-20 14:45:12

标签: python selenium automation

我们是否可以获得页面源代码(TimeoutException:Message:timeout)?

当我调用" driver.page_source"时,它有时无法加载整页。

但我只需要部分尚未确定的信息。所以我只想在任何情况下保存页面。有可能吗?

import random

def info_request(driver,project_id,project_url,path):
    driver.get(project_url)
    sleep(0.2+random.uniform(0, 1.5))
    doc = driver.page_source
    with open(str(project_id)+ ".html", "w") as f:
        f.write(doc.encode("utf-8"))
    return project_info

driver = webdriver.Chrome()
driver.implicitly_wait(40)
driver.set_page_load_timeout(40)

project_info = info_request(driver,project_id,project_url,path)

driver.close()

1 个答案:

答案 0 :(得分:0)

简短的回答是否定的。 Selenium不提供对部分DOM的访问。建议增加driver.set_page_time_out()。这不一定会增加脚本执行时间,但如果需要,将允许页面有更多时间加载。为了提高脚本可靠性,您可以使用try / except块包装page_source和后续代码,这样即使发生超时,也可以运行其他测试,而不会使脚本运行到异常中。

import random

def info_request(driver,project_id,project_url,path):
    try:
        driver.get(project_url)
        # sleep(0.2+random.uniform(0, 1.5))
        doc = driver.page_source
        with open(str(project_id)+ ".html", "w") as f:
            f.write(doc.encode("utf-8"))
        return project_info
    except TimeoutException:
        return "Timeout"

driver = webdriver.Chrome()
driver.implicitly_wait(40)
driver.set_page_load_timeout(120)  # Or your preferred patience factor

project_info = info_request(driver,project_id,project_url,path)

driver.close()