我正在使用selenium
点击我想要的网页,然后使用Beautiful Soup
解析网页。
有人展示了how to get inner HTML of an element in a Selenium WebDriver
。有没有办法获取整个页面的HTML?感谢
Python
中的示例代码
(根据上面的帖子,语言似乎并不重要):
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from bs4 import BeautifulSoup
url = 'http://www.google.com'
driver = webdriver.Firefox()
driver.get(url)
the_html = driver---somehow----.get_attribute('innerHTML')
bs = BeautifulSoup(the_html, 'html.parser')
答案 0 :(得分:24)
获取整个页面的HTML:
from selenium import webdriver
driver = webdriver.Firefox()
driver.get("http://stackoverflow.com")
html = driver.page_source
获取外部HTML(包含标签):
# HTML from `<html>`
html = driver.execute_script("return document.documentElement.outerHTML;")
# HTML from `<body>`
html = driver.execute_script("return document.body.outerHTML;")
# HTML from element with some JavaScript
element = driver.find_element_by_css_selector("#hireme")
html = driver.execute_script("return arguments[0].outerHTML;", element)
# HTML from element with `get_attribute`
element = driver.find_element_by_css_selector("#hireme")
html = element.get_attribute('outerHTML')
要获取内部HTML(标记已排除):
# HTML from `<html>`
html = driver.execute_script("return document.documentElement.innerHTML;")
# HTML from `<body>`
html = driver.execute_script("return document.body.innerHTML;")
# HTML from element with some JavaScript
element = driver.find_element_by_css_selector("#hireme")
html = driver.execute_script("return arguments[0].innerHTML;", element)
# HTML from element with `get_attribute`
element = driver.find_element_by_css_selector("#hireme")
html = element.get_attribute('innerHTML')
答案 1 :(得分:0)
使用页面对象:
@FindBy(xpath = "xapth")
private WebElement element;
public String getInnnerHtml() {
System.out.println(waitUntilElementToBeClickable(element, 10).getAttribute("innerHTML"));
return waitUntilElementToBeClickable(element, 10).getAttribute("innerHTML")
}
答案 2 :(得分:0)
driver.page_source
可能已过时。以下为我工作
let html = await driver.getPageSource();