Question

我一直在尝试使用selenium从表中抓取数据，但是当我运行代码时，它仅获取表的标题。

from selenium import webdriver

driver = webdriver.Chrome()
driver.get('http://www.panamacompra.gob.pa/Inicio/#!/busquedaAvanzada?BusquedaRubros=true&IdRubro=41')
driver.implicitly_wait(100)
table = driver.find_element_by_xpath('/html/body/div[1]/div[2]/div/div[2]/div/div/div[2]/div[2]/div[3]/table/tbody')
print(t.text)

我也尝试使用表格按标签名称查找元素，

Answer 1

您应该尝试以下操作：

from selenium import webdriver

driver = webdriver.Chrome()
driver.get('http://www.panamacompra.gob.pa/Inicio/#!/busquedaAvanzada?BusquedaRubros=true&IdRubro=41')
driver.implicitly_wait(100)

table = driver.find_element_by_xpath('/html/body/div[1]/div[2]/div/div[2]/div/div/div[2]/div[2]/div[3]/table/tbody')
number=2
while(number<12):
    content = driver.find_element_by_xpath('//*[@id="body"]/div/div[2]/div/div/div[2]/div[2]/div[3]/table/tbody/tr['+str(number)+']')
    print(content.text)
    number+=1

“表”中的XPATH只是标题，实际内容是这样的：“ // * [@ id =“ body”] / div / div [2] / div / div / div [2] / div [2] / div [3] / table / tbody / tr ['+ str（number）+']'，这就是为什么您没有得到与标头不同的内容的原因。由于行中的XPATH类似于..... / tr [2]，..... / tr [3]，..... / tr [4]等，因此Im使用str（number） <12，要获取所有原始数据，您还可以一次尝试50行，这取决于您自己。

Answer 2

我将使用requests并以更快的速度模仿页面的POST请求

import requests

data = {'METHOD': '0','VALUE': '{"BusquedaRubros":"true","IdRubro":"41","Inicio":0}'}
r = s.post('http://www.panamacompra.gob.pa/Security/AmbientePublico.asmx/cargarActosOportunidadesDeNegocio', data=data).json()
print(r['listActos'])

Answer 3

Selenium正在加载表（相当快地发生），然后假定它已经完成，因为从没有机会加载表行（发生得更慢）。解决此问题的一种方法是反复尝试查找在表加载完成之前不会出现的元素。

这是最优雅的解决方案中的FAR（可能还有Selenium库可以做得更好），但是您可以通过检查是否可以找到新的表行来等待表，如果没有，请休眠1再试一次。

from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
import time


driver = webdriver.Chrome()
driver.get('http://www.panamacompra.gob.pa/Inicio/#!/busquedaAvanzada?BusquedaRubros=true&IdRubro=41')

wvar = 0
while(wvar == 0):
  try:
    #try loading one of the elements we want to read
    el = driver.find_element_by_xpath('/html/body/div[1]/div[2]/div/div[2]/div/div/div[2]/div[2]/div[3]/table/tbody/tr[3]')
    wvar = 1
  except NoSuchElementException:
    #not loaded yet
    print('table body empty, waiting...')
    time.sleep(1)

print('table loaded!')

#element got loaded; reload the table
table = driver.find_element_by_xpath('/html/body/div[1]/div[2]/div/div[2]/div/div/div[2]/div[2]/div[3]/table/tbody')
print(table.text)

Answer 4

您需要等待加载程序消失，才能使用invisibility_of_element_located，WebDriverWait和expected_conditions。对于表，您可以使用css_selector代替xpath。

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time

driver = webdriver.Chrome()
driver.get('http://www.panamacompra.gob.pa/Inicio/#!/busquedaAvanzada?BusquedaRubros=true&IdRubro=41')

time.sleep(2)
WebDriverWait(driver, 50).until(EC.invisibility_of_element_located((By.XPATH, '//img[@src="images/loading.gif"]')))
table = driver.find_element_by_css_selector('.table_asearch.table.table-bordered.table-striped.table-hover.table-condensed')
print(table.text)
driver.quit()

从表中删除其元素不会立即加载的数据

4 个答案: