在使用python硒抓取动态表时,代码“缺少”了一些元素

时间:2020-06-01 16:42:39

标签: python selenium web-scraping

我正在尝试提取一个动态表格,其中包含阿根廷省选举区的选举结果。从此表中,我感兴趣的是检索选举巡回赛的名称('cmbCircuitos'),以及投票给各政党的票数[votos]

问题在于,即使代码“正确地”工作(运行时也没有错误),仍然存在某些电路,因此选举结果也导致代码无法检索。也就是说,由于无法提取第2区,因此该代码两次检索第1区。知道为什么会这样吗,我该如何解决?

代码如下:

driver = webdriver.Chrome('/Users/Administrador/Documents/chromedriver')

cir = []
votos = []
votos1 = []


def switch_to_top():
   driver.switch_to.default_content()
   driver.switch_to.frame("topFrame")

def switch_to_main():
   driver.switch_to.default_content()
   driver.switch_to.frame("mainFrame")

main_url = 'https://www.justiciacordoba.gob.ar/Estatico/JEL/Escrutinios/ReportesEleccion20190512/default.html'
driver.get(main_url)

switch_to_top()

dropdown_secciones = driver.find_element_by_id('cmbSecciones')
select_box_secciones = Select(dropdown_secciones)
options_secciones = select_box_secciones.options

mostrar_click = driver.find_element_by_id('cmdMostrar')


for index in range(1, len(options_secciones)):
   if (index > 1):
       switch_to_top()
   select_box_secciones.select_by_index(index)

   dropdown_circuitos = driver.find_element_by_id('cmbCircuitos')
   select_box_circuitos = Select(dropdown_circuitos)
   items_circuitos = select_box_circuitos.options


   for i in range(1, len(items_circuitos)):
       if (i > 1):
           switch_to_top()
       select_box_circuitos.select_by_index(i)
       mostrar_click.click()
       switch_to_main()
       WebDriverWait(driver, 220).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "body>table")))
       soup = BeautifulSoup(driver.page_source, "html.parser")

       for td in soup.findAll('td',{'class':'c1'}):
           circuitos = td.text
           cir.append(circuitos)

       for tr in soup.find('table').find_all('tr'):

           row = tr.find_all(lambda td: td.has_attr('class'))            

           if (len(row) == 3)  and (row[0].text != 'Nº'):
               data = [td.text for td in row]
               votos.append(data)

           if (len(row) == 2) and (row[0].text != 'Nº'):
               datos = [td.text for td in row]
               votos1.append(datos)

0 个答案:

没有答案