等待表使用硒与python完全加载

时间:2014-08-09 18:07:40

标签: python selenium selenium-webdriver web-scraping

我想从表格中的页面中抓取一些数据。所以我只是对表格中的数据感到困扰。之前我使用的是Mechanize,但我发现有时会遗漏一些数据,尤其是在表格底部。谷歌搜索,我发现它可能是由于机械化不处理Jquery / Ajax。

所以我今天转到了Selenium。如何等待一个且只有一个表完全加载,然后使用selenium和python从该表中提取所有链接?如果我等待加载完整页面,则需要一些时间。我想确保只加载表中的数据。我目前的代码:

driver = webdriver.Firefox()
for page in range(1, 2):
    driver.get("http://somesite.com/page/"+str(page))
    table = driver.find_element_by_css_selector('div.datatable')
    links = table.find_elements_by_tag_name('a')
    for link in links:
        print link.text

2 个答案:

答案 0 :(得分:4)

使用WebDriverWait等到找到该表:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

...
wait = WebDriverWait(driver, 10)
table = wait.until(EC.presence_of_element_located(By.CSS_SELECTOR, 'div.datatable'))

这将是显式等待


或者,您可以制作驱动程序wait implicitly

  

隐式等待是告诉WebDriver对DOM进行轮询   尝试查找一个或多个元素的时间量   没有立即可用。默认设置为0.一旦设置,   隐式等待是为WebDriver对象实例的生命周期设置的。

from selenium import webdriver

driver = webdriver.Firefox()
driver.implicitly_wait(10) # wait up to 10 seconds while trying to locate elements
for page in range(1, 2):
    driver.get("http://somesite.com/page/"+str(page))
    table = driver.find_element_by_css_selector('div.datatable')
    links = table.find_elements_by_tag_name('a')
    for link in links:
        print link.text

答案 1 :(得分:0)

也许您可以使用Selenium的预期条件(http://docs.seleniumhq.org/docs/04_webdriver_advanced.jsp),例如

>>> from selenium import webdriver
>>> from selenium.webdriver.common.by import By
>>> from selenium.webdriver.support.ui import WebDriverWait
>>> from selenium.webdriver.support import expected_conditions as EC 
>>> 
>>> ff = webdriver.Firefox()
>>> ff.get("http://www.datatables.net/examples/data_sources/js_array.html")
>>> try:
...     element = WebDriverWait(ff, 10).until(EC.presence_of_element_located((By.ID, "example")))
...     print element.text
... finally:
...     ff.quit()
... 

Engine Browser Platform Version Grade
Gecko Firefox 1.0 Win 98+ / OSX.2+ 1.7 A
Gecko Firefox 1.5 Win 98+ / OSX.2+ 1.8 A
Gecko Firefox 2.0 Win 98+ / OSX.2+ 1.8 A
Gecko Firefox 3.0 Win 2k+ / OSX.3+ 1.9 A
Gecko Camino 1.0 OSX.2+ 1.8 A
Gecko Camino 1.5 OSX.3+ 1.8 A
Gecko Netscape 7.2 Win 95+ / Mac OS 8.6-9.2 1.7 A
Gecko Netscape Browser 8 Win 98SE+ 1.7 A
Gecko Netscape Navigator 9 Win 98+ / OSX.2+ 1.8 A
Gecko Mozilla 1.0 Win 95+ / OSX.1+ 1 A