我们是否可以获得页面源代码(TimeoutException:Message:timeout)?
当我调用" driver.page_source"时,它有时无法加载整页。
但我只需要部分尚未确定的信息。所以我只想在任何情况下保存页面。有可能吗?
import random
def info_request(driver,project_id,project_url,path):
driver.get(project_url)
sleep(0.2+random.uniform(0, 1.5))
doc = driver.page_source
with open(str(project_id)+ ".html", "w") as f:
f.write(doc.encode("utf-8"))
return project_info
driver = webdriver.Chrome()
driver.implicitly_wait(40)
driver.set_page_load_timeout(40)
project_info = info_request(driver,project_id,project_url,path)
driver.close()
答案 0 :(得分:0)
简短的回答是否定的。 Selenium不提供对部分DOM的访问。建议增加driver.set_page_time_out()。这不一定会增加脚本执行时间,但如果需要,将允许页面有更多时间加载。为了提高脚本可靠性,您可以使用try / except块包装page_source和后续代码,这样即使发生超时,也可以运行其他测试,而不会使脚本运行到异常中。
import random
def info_request(driver,project_id,project_url,path):
try:
driver.get(project_url)
# sleep(0.2+random.uniform(0, 1.5))
doc = driver.page_source
with open(str(project_id)+ ".html", "w") as f:
f.write(doc.encode("utf-8"))
return project_info
except TimeoutException:
return "Timeout"
driver = webdriver.Chrome()
driver.implicitly_wait(40)
driver.set_page_load_timeout(120) # Or your preferred patience factor
project_info = info_request(driver,project_id,project_url,path)
driver.close()