Question

我想选择以下文字：

大胆正常斜体

我需要选择并获得：粗体正常的italist。

html是：

<a href=""><strong>Bold</strong> normal <i>Italist</i></a>

但是，a/text()会产生

正常

只。有谁知道修复？我正在测试bing抓取，并且粗体文本位于不同的位置，具体取决于查询。

Answer 1

您可以使用a//text()代替a/text()来获取所有文字项目。

# -*- coding: utf-8 -*-
from scrapy.selector import Selector

doc = """
<a href=""><strong>Bold</strong> normal <i>Italist</i></a>
"""

sel = Selector(text=doc, type="html")

result = sel.xpath('//a/text()').extract()
print result
# >>> [u' normal ']

result = u''.join(sel.xpath('//a//text()').extract())
print result
# >>> Bold normal Italist

Answer 2

您可以尝试使用

a/string()

或

normalize-space(a)

返回Bold normal Italist

Scrapy：如何获得正确的选择器

2 个答案: