我正在尝试Scrapy。我有以下内容:
hxs.select('//span[contains(@itemprop, "price")]').extract()
输出:
[u'<span itemprop="price" class="offer_price">\n<span class="currency">\u20ac</span>\n16<span class="offer_price_fraction">,95</span>\n</span>']
如何检索此输出:
16.95
换句话说,使用分数价格跨度+用...替换,添加价格。
答案 0 :(得分:1)
以下是我如何设置XPath Selector:
>>> hxs.extract()
u'<html><body><span itemprop="price" class="offer_price">\n<span class="currency">\u20ac</span>\n16<span class="offer_price_fraction">,95</span>\n</span></body></html>'
以下是您如何达到预期效果的方法:
>>> price = 'descendant::span[@itemprop="price"]'
>>> whole = 'text()'
>>> fract = 'descendant::span[@class="offer_price_fraction"]/text()'
>>> s = hxs.select(price).select('%s | %s' % (whole, fract)).extract()
>>> s
[u'\n', u'\n16', u',95', u'\n']
>>> ''.join(s).strip().replace(',', '.')
u'16.95'
答案 1 :(得分:1)
使用此单个XPath表达式:
translate(
concat(//span[@itemprop = 'price']/text()[normalize-space()],
//span[@itemprop = 'price']/span[@class='offer_price_fraction']
),
',',
'.'
)
基于XSLT的验证:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:copy-of select=
"translate(
concat(//span[@itemprop = 'price']/text()[normalize-space()],
//span[@itemprop = 'price']/span[@class='offer_price_fraction']
),
',',
'.'
)"/>
</xsl:template>
</xsl:stylesheet>
在此XML文档上应用此转换时:
<span itemprop="price" class="offer_price">
<span class="currency">\u20ac</span>
16<span class="offer_price_fraction">,95</span>
</span>
评估XPath表达式并将此评估结果复制到输出中:
16.95