我想在这个link上抓桌子。我试图在页面加载后使用selenium获取数据,但我没有成功。关于如何从该网页上删除表格的任何其他想法?
编辑 -
我试过
from selenium import webdriver
driver = webdriver.PhantomJS()
driver.get("https://steria.taleo.net/careersection/in_cs_ext_fs/jobsearch.ftl?lang=en&radiusType=K&location=462170431401&searchExpanded=true&radius=1")
print(driver.find_element_by_class_name('table').text)
driver.close()
答案 0 :(得分:3)
当动态生成表格内容时,您应该等到JavaScript
执行才能获得所需数据:
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium import webdriver
driver = webdriver.PhantomJS()
driver.get("https://steria.taleo.net/careersection/in_cs_ext_fs/jobsearch.ftl?lang=en&radiusType=K&location=462170431401&searchExpanded=true&radius=1")
table = wait(driver, 10).until(EC.presence_of_element_located(("xpath", "//table[@id='jobs' and ./tbody/tr]")))
print(table.text)
next_button = driver.find_element_by_link_text("Next")
next_button.click()
wait(driver, 5).until(lambda x: next_button.get_attribute("aria-disabled") == "true")
table = wait(driver, 10).until(EC.presence_of_element_located(("xpath", "//table[@id='jobs' and ./tbody/tr]")))
print(table.text)
driver.close()
答案 1 :(得分:0)
你可以试试Beautiful Soup,看看这篇文章:http://srome.github.io/Parsing-HTML-Tables-in-Python-with-BeautifulSoup-and-pandas/