我需要以健壮的方式获取此网址"http://www.screener.com/v2/stocks/view/5131"
但是,在它们之间的理想数据之前有太多的空白区域并且它不健壮。
我需要的部分是11.48,9.05,11.53
,来自下面的html:
<div class="table-responsive">
<table class="table table-hover">
<tr>
<th>Financial Year</th>
<th class="number">Revenue ('000)</th>
<th class="number">Net ('000)</th>
<th class="number">EPS</th>
<th></th>
</tr>
<tr>
<td>30 Nov, 2017</td>
<td class="number">205,686</td>
<td class="number">52,812</td>
<td class="number">11.48</td>
<td></td>
</tr>
<tr>
<td>30 Nov, 2016</td>
<td class="number">191,301</td>
<td class="number">41,598</td>
<td class="number">9.05</td>
<td></td>
</tr>
<tr>
<td>30 Nov, 2015</td>
<td class="number">225,910</td>
<td class="number">51,082</td>
<td class="number">11.53</td>
<td></td>
</tr>
我的代码如下
from lxml import html
import requests
page = requests.get('http://www.screener.com/v2/stocks/view/5131')
output = html.fromstring(page.content)
output.xpath('//tr/td/following-sibling::td/text()')
如何更改代码,以便它可以稳健地从表格中获取三个数字,如上所示?
我只想要输出11.48,9.05,11.53
,但我无法摆脱表格中的太多数据
答案 0 :(得分:0)
尝试使用XPath以获得所需的输出:
//div[@id="annual"]//tr/td[position() = last() - 1]/text()