下面的网站有几个表,但我的代码无法获取特定的表(也没有任何其他表)。
该代码旨在从表格“AçõesemCirculaçãodeMercado”中获取数据 - >来自网页的最后一个表格。
我已经尝试了下面的代码和一些替代方案,但没有一个对我有用:
import pandas as pd
from selenium import webdriver
from time import sleep
url = "http://bvmf.bmfbovespa.com.br/cias-Listadas/Empresas-Listadas/BuscaEmpresaListada.aspx?idioma=pt-br"
Ticker='ITUB4'
browser = webdriver.Chrome()
browser.get(url)
sleep(2) #Wait webpage to load
browser.find_element_by_xpath(('//*[@id="ctl00_contentPlaceHolderConteudo_BuscaNomeEmpresa1_txtNomeEmpresa_txtNomeEmpresa_text"]')).send_keys(Ticker)
browser.find_element_by_xpath(('//*[@id="ctl00_contentPlaceHolderConteudo_BuscaNomeEmpresa1_btnBuscar"]')).click();
sleep(2) #Wait webpage to load
browser.find_element_by_xpath(('//*[@id="ctl00_contentPlaceHolderConteudo_BuscaNomeEmpresa1_grdEmpresa_ctl01"]/tbody/tr/td[1]/a')).click();
sleep(5) #Wait webpage to load
#This is not working
content = browser.find_element_by_css_selector('//div[@id="div1"]')
#This is not working as well
#browser.find_element_by_xpath('//*[@id="div1"]/div/div/div[1]/table/tbody/tr[1]/td[1]').text
可以在此处找到表格和完整HTML:
HTML是:
<div id="div1">
<div>
<h3>Ações em Circulação no Mercado</h3>
<div class="table-wrapper"><div class="scrollable"><table class="responsive">
<thead>
<tr>
<th colspan="3" class="text-center">19/04/2017</th>
</tr>
<tr>
<td>Tipos de Investidores / Ações</td>
<td class="text-center">Quantidade</td>
<td class="text-center">Percentual</td>
</tr>
</thead>
<tbody><tr>
<td>Pessoas Físicas</td>
<td class="text-right">108.853</td>
<td class="text-right"> - </td>
</tr>
<tr>
<td>Pessoas Jurídicas</td>
<td class="text-right">11.591</td>
<td class="text-right"> - </td>
</tr>
<tr>
<td>Investidores Institucionais</td>
<td class="text-right">1.039</td>
<td class="text-right"> - </td>
</tr>
<tr>
<td>Quantidade de Ações Ordinárias</td>
<td class="text-right">272.710.309</td>
<td class="text-right">8,21</td>
</tr>
<tr>
<td>Quantidade de Ações Preferenciais</td>
<td class="text-right">3.141.058.175</td>
<td class="text-right">97,23</td>
</tr>
<tr>
<td>Total de Ações</td>
<td class="text-right">3.413.768.484</td>
<td class="text-right">52,11</td>
</tr>
</tbody></table></div><div class="pinned"></div></div>
</div>
</div>
答案 0 :(得分:1)
您在CSS选择器定义中编写了XPATH。你应该找到
tables = browser.find_elements_by_css_selector('.responsive')
如果你想要所有表,然后从它们解析。要么
使用browser.find_element_by_xpath(.//*[@id='div1']/div/table)
找到确切的表格。
答案 1 :(得分:1)
您可以进行一次快速更正,将此content = browser.find_element_by_css_selector('//div[@id="div1"]')
更改为content = browser.find_element_by_xpath('//div[@id="div1"]')
,因为它实际上是您正在使用的xpath。
第二次尝试不起作用的原因可能是div1元素未滚动到视图中。硒与不可见的元素不能很好地相互作用。所以试试这个:
element = browser.find_element_by_xpath('//*[@id="div1"]')
# Force the element to be scrolled into view, even if you don't need its location.
location = element.location_once_scrolled_into_view
# Now Selenium can get its text.
text = element.text
答案 2 :(得分:1)
要找到 WebElement 并提取文本 Pessoas Fisicas ,您可以使用以下代码行:
content = driver.find_element_by_xpath("//h3[.,'Ações em Circulação no Mercado']//following::div[1]//table[@class='responsive']//tr//following-sibling::td[1]").get_attribute("innerHTML")
xpath
表达式:
//h3[.,'Ações em Circulação no Mercado']//following::div[1]//table[@class='responsive']//tr//following-sibling::td[1]
不应该在单引号内,例如'xpath_here'
。将xpression放在双引号中,例如"xpath_here"
查看工作快照: