python:使用lxml xpath从更改的span类中获取数据

时间:2016-11-08 13:40:32

标签: python html xpath lxml

我想提取资产回报率&#39;来自wsj网站。但是,我的代码不够健壮,无法在不同的条件下工作。 我能够提取股票代码SCGM&#39;使用下面的代码但未能使用AASIA&#39;作为<span class="marketDelta deltaType-negative">

from lxml import html
import requests

StockData =['SCGM','AASIA']
page_wsj1 = requests.get('http://quotes.wsj.com/MY/'+StockData[x]+'/financials')
wsj1 = html.fromstring(page_wsj1.content)
wsj_fig = wsj1.xpath('//span[@class="marketDelta noChange"]/text()')
ROA = wsj_fig[25]

对于SCGM没有问题,但对于AASIA来说,它不会因为span类被更改而起作用。 对于SCGM,html标签如下。完整链接here

<tr> <td> <span class="data_lbl">Return on Assets</span> <span class="data_data"> <span class="marketDelta noChange">18.26</span> </span> </td> </tr>

对于AASIA,html标签如下。完整链接here

<tr> <td> <span class="data_lbl">Return on Assets</span> <span class="data_data"> <span class="marketDelta deltaType-negative">-1.36</span> </span> </td> </tr>

如何制作适用于这两种情况的代码,或直接指向资产回报率&#39;?

1 个答案:

答案 0 :(得分:0)

//td[normalize-space(span) = "Return on Assets"]/span[@class = "data_data"]/span