使用python3从网页获取完整的检查元素代码

时间:2019-07-03 15:10:15

标签: python python-requests python-3.6

我试图从网页上运行js代码后获取加载的html,这与“检查元素”中显示的相同。但这没有给出正确的结果。 我尝试执行以下操作:

from selenium import webdriver

import requests


url = 'xxx'

options = webdriver.ChromeOptions()
options.add_argument('headless')
driver = webdriver.Chrome(chrome_options=options)
driver.get(url)

# This will get the initial html - before javascript
html1 = driver.page_source

# This will get the html after on-load javascript
html2 = driver.execute_script("return document.documentElement.innerHTML;")

print(html1)
print('\n\n')
print(html2)

我想从inspect元素(在这种情况下为html2)获取完整代码。我发现这种尝试是在页面完全加载之前从页面获取信息的。我该怎么做才能解决此问题?

2 个答案:

答案 0 :(得分:0)

您需要等到页面上显示所需数据为止

https://selenium-python.readthedocs.io/waits.html#explicit-waits

答案 1 :(得分:0)

依赖项:

from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

这将等待其ID等于[ID_OF_ELEMENT]的元素。

timeout = 5

try:
    element = WebDriverWait(driver timeout).until(EC.presence_of_element_located((By.ID, '[ID_OF_ELEMENT]')))
    #Page ready
except TimeoutException:
    #Timeout