Python / Xpath转换查询

时间:2013-03-02 17:26:28

标签: python xpath scrapy

我正在尝试Scrapy。我有以下内容:

hxs.select('//span[contains(@itemprop, "price")]').extract()

输出:

[u'<span itemprop="price" class="offer_price">\n<span class="currency">\u20ac</span>\n16<span class="offer_price_fraction">,95</span>\n</span>']

如何检索此输出:

16.95

换句话说,使用分数价格跨度+用...替换,添加价格。

2 个答案:

答案 0 :(得分:1)

以下是我如何设置XPath Selector:

>>> hxs.extract()
u'<html><body><span itemprop="price" class="offer_price">\n<span class="currency">\u20ac</span>\n16<span class="offer_price_fraction">,95</span>\n</span></body></html>'

以下是您如何达到预期效果的方法:

>>> price = 'descendant::span[@itemprop="price"]'
>>> whole = 'text()'
>>> fract = 'descendant::span[@class="offer_price_fraction"]/text()'
>>> s = hxs.select(price).select('%s | %s' % (whole, fract)).extract()
>>> s
[u'\n', u'\n16', u',95', u'\n']
>>> ''.join(s).strip().replace(',', '.')
u'16.95'

答案 1 :(得分:1)

使用此单个XPath表达式:

   translate(
             concat(//span[@itemprop = 'price']/text()[normalize-space()],
                    //span[@itemprop = 'price']/span[@class='offer_price_fraction']
                    ),
             ',',
             '.'
             )

基于XSLT的验证:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/">
  <xsl:copy-of select=
  "translate(
          concat(//span[@itemprop = 'price']/text()[normalize-space()],
                  //span[@itemprop = 'price']/span[@class='offer_price_fraction']
                 ),
           ',',
           '.'
            )"/>
 </xsl:template>
</xsl:stylesheet>

在此XML文档上应用此转换时:

<span itemprop="price" class="offer_price">
  <span class="currency">\u20ac</span>
16<span class="offer_price_fraction">,95</span>
</span>

评估XPath表达式并将此评估结果复制到输出中:

16.95