我正在尝试使用python 3和webdriver提取一个javascript渲染表。
我的代码如下:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
# Create a new instance of the Firefox driver
driver = webdriver.Firefox()
driver.get("http://esploracolfis.sns.it/EsploraCoLFIS/#!0:t=L&l=1;1:r=T")
driver.refresh()
# Wait for the dynamically loaded elements to show up
WebDriverWait(driver, 10).until(
EC.visibility_of_element_located((By.TAG_NAME, "table")))
# And grab the page HTML source
html = driver.page_source
driver.quit()
print(html)
现在,当我打印正文时,我的打印中不存在javascript渲染的内容。 我怎样才能提取我想要的表格(表格的整个html代码)?
非常感谢
答案 0 :(得分:0)
我为解决您的问题所做的是使用Beautifulsoup库来解析源代码。
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import bs4
# Create a new instance of the Firefox driver
driver = webdriver.Firefox()
driver.get("http://esploracolfis.sns.it/EsploraCoLFIS/#!0:t=L&l=1;1:r=T")
driver.refresh()
# Wait for the dynamically loaded elements to show up
WebDriverWait(driver, 10).until(
EC.visibility_of_element_located((By.TAG_NAME, "table")))
# And grab the page HTML source
html = driver.page_source
# Turns html into a beautifulsoup object
bs4_html = bs4.BeautifulSoup(html, 'lxml')
# Finds the table
table = bs4_html.find_all('table')
driver.quit()
print(table)
控制台输出一英里长,所以我不能把它放在这里。
希望有所帮助!