我遇到过这样的HTML:
<span itemprop="description">
Colour: Blue
<br>
Fabric: Cotton Silk
<br>
Type Of Work: Printed
<br><br>
Product colour may slightly vary due to photographic lighting sources or your monitor settings.
</span>
我想在断点之间解析文本并单独获取它们。期望的结果如下:
["Colour: Blue", "Fabric: Cotton Silk", "Product colour may slightly vary due to photographic lighting sources or your monitor settings."]
我已经尝试了
response.xpath('//*[@itemprop="description"]/text()').extract()
但这会将整个文本放在一个字符串中。
如何围绕&#34;
&#34;标签
答案 0 :(得分:0)
我尝试了你的代码,看起来它正在运行。我做了一些调整来清理通过re()方法提取的数据:
>>> sel.xpath('//span[@itemprop="description"]/text()').re("\s*(.+)\s*")
[u'Colour: Blue', u'Fabric: Cotton Silk', u'Type Of Work: Printed', u'Product colour may slightly vary due to photographic lighting sources or your monitor settings.']
这就是你需要的吗?