Python 3.5 + Selenium Scrap。无论如何选择<a></a> <a> tags?

时间:2015-12-01 20:38:17

标签: python selenium xpath web-scraping

So I'm very new to python and selenium. I'm writting an scraper to take some balances and download a txt file. So far I've managed to grab the account balances but downloading the txt files have proven to be a difficult task. This is a sample of the html

<td>
 <div id="expoDato_msdd" class="dd noImprimible" style="width: 135px">
  <div id="expoDato_title123" class="ddTitle">
   <span id="expoDato_arrow" class="arrow" style="background-position: 0pt 0pt"></span>
   <span id="expoDato_titletext" class="textTitle">Exportar Datos</span>
  </div>
  <div id="expoDato_child" class="ddChild" style="width: 133px; z-index: 50">
   <a class="enabled" href="/CCOLEmpresasCartolaHistoricaWEB/exportarDatos.do;jsessionid=9817239879882871987129837882222R?tipoExportacion=txt">txt</a>
   <a class="enabled" href="/CCOLEmpresasCartolaHistoricaWEB/exportarDatos.do;jsessionid=9817239879882871987129837882222R?tipoExportacion=pdf">PDF</a>
   <a class="enabled" href="/CCOLEmpresasCartolaHistoricaWEB/exportarDatos.do;jsessionid=9817239879882871987129837882222R?tipoExportacion=excel">Excel</a>
   <a class="modal" href="#info_formatos">Información Formatos</a>
  </div>
 </div>

I need to click on the fisrt "a" class=enabled. But i just can't manage to get there by xpath, class or whatever really. Here is the last thing i tried.

#Descarga de Archivos
ddmenu2 = driver.find_element_by_id("expoDato_child")
ddmenu2.find_element_by_css_selector("txt").click()

This is more of the stuff i've already tryed

#TXT = driver.select
#TXT.send_keys(Keys.RETURN)
#ddmenu2 = driver.find_element_by_xpath("/html/body/div[1]/div[1]/div/div/form/table/tbody/tr[2]/td/div[2]/table/tbody/tr/td[4]/div/div[2]")
#Descarga = ddmenu2.find_element_by_visible_text("txt")
#Descarga.send_keys(Keys.RETURN)

Please i would apreciate your help.

Ps:English is not my native language, so i'm sorry for any confusion.

EDIT:

This was the approach that worked, I'll try your other suggetions to make a more neat code. Also it will only work if the mouse pointer is over the browser windows, it doesn't matter where.

ddmenu2a = driver.find_element_by_xpath("/html/body/div[1]/div[1]/div/div/form/table/tbody/tr[2]/td/div[2]/table/tbody/tr/td[4]/div/div[1]").click()
ddmenu2b = driver.find_element_by_xpath("/html/body/div[1]/div[1]/div/div/form/table/tbody/tr[2]/td/div[2]/table/tbody/tr/td[4]/div/div[2]")
ddmenu2c = driver.find_element_by_xpath("/html/body/div[1]/div[1]/div/div/form/table/tbody/tr[2]/td/div[2]/table/tbody/tr/td[4]/div/div[2]/a[1]").click()

Pretty much brute force, but im getting to like python scripting.

4 个答案:

答案 0 :(得分:1)

或者只是使用CSS匹配href

driver.find_element_by_css_selector("div#expoDato_child a.enabled[href*='txt']")

答案 1 :(得分:0)

你可以得到这样的所有锚元素:

a_list = driver.find_elements_by_tag_name('a')

这将返回元素列表。你可以点击每个元素:

for a in a_list:
    a.click()
    driver.back()

或为每个锚元素尝试xpath

a1 = driver.find_element_by_xpath('//a[@class="enabled"][1]')
a2 = driver.find_element_by_xpath('//a[@class="enabled"][2]')
a3 = driver.find_element_by_xpath('//a[@class="enabled"][3]')

如果这有用,请告诉我

答案 2 :(得分:0)

你可以通过文字xpath直接到达元素:

driver.find_element_by_xpath("//*[@id='expoDato_child' and contains(., 'txt')]").click()
driver.find_element_by_xpath("//*[@id='expoDato_child' and contains(., 'PDF')]").click()
...

答案 3 :(得分:0)

如果有相关网页的公开链接会有所帮助。

但是,一般来说,我可以考虑两种方法:

如果你能发现直接链接,你可以提取链接文本并使用pythons&#39; urllib并直接下载文件。

使用Seleniums&#39;单击功能,然后单击页面中的链接。

因此快速搜索: downloading-file-using-selenium