Question

我遇到过这样的HTML：

<span itemprop="description">
    Colour: Blue
    <br>
    Fabric: Cotton Silk
    <br>
    Type Of Work: Printed
    <br><br>
    Product colour may slightly vary due to photographic lighting sources or your monitor settings.
</span>

我想在断点之间解析文本并单独获取它们。期望的结果如下：

["Colour: Blue", "Fabric: Cotton Silk", "Product colour may slightly vary due to photographic lighting sources or your monitor settings."]

我已经尝试了

response.xpath('//*[@itemprop="description"]/text()').extract()

但这会将整个文本放在一个字符串中。

如何围绕＆＃34;
＆＃34;标签

Answer 1

我尝试了你的代码，看起来它正在运行。我做了一些调整来清理通过re()方法提取的数据：

>>> sel.xpath('//span[@itemprop="description"]/text()').re("\s*(.+)\s*")
[u'Colour: Blue', u'Fabric: Cotton Silk', u'Type Of Work: Printed',  u'Product colour may slightly vary due to photographic lighting sources or your monitor settings.']

这就是你需要的吗？

Scrapy：在断点之间解析文本

1 个答案: