Scrapy:在断点之间解析文本

时间:2015-12-26 21:18:46

标签: scrapy

我遇到过这样的HTML:

<span itemprop="description">
    Colour: Blue
    <br>
    Fabric: Cotton Silk
    <br>
    Type Of Work: Printed
    <br><br>
    Product colour may slightly vary due to photographic lighting sources or your monitor settings.
</span>

我想在断点之间解析文本并单独获取它们。期望的结果如下:

["Colour: Blue", "Fabric: Cotton Silk", "Product colour may slightly vary due to photographic lighting sources or your monitor settings."]

我已经尝试了

response.xpath('//*[@itemprop="description"]/text()').extract()

但这会将整个文本放在一个字符串中。

如何围绕&#34;
&#34;标签

1 个答案:

答案 0 :(得分:0)

我尝试了你的代码,看起来它正在运行。我做了一些调整来清理通过re()方法提取的数据:

>>> sel.xpath('//span[@itemprop="description"]/text()').re("\s*(.+)\s*")
[u'Colour: Blue', u'Fabric: Cotton Silk', u'Type Of Work: Printed',  u'Product colour may slightly vary due to photographic lighting sources or your monitor settings.']

这就是你需要的吗?