Question

我想选择div内的所有文本，而不考虑其中的标签。

<div>
<p>some text here <a href="">a link here  <span>span here<span></a></p>
</div>

我需要得到结果

一些文本在这里链接在这里

我尝试过

response.xpath('//div/text()')

Answer 1

您要的是该div的字符串值：

string(/div)

或者，如果您希望从末端修剪空格并在内部合并：

normalize-space(/div)

Answer 2

尝试使用XPath string()：

response.xpath('string(//div)').extract_first()

Answer 3

检查以下代码以进行澄清

response.xpath('//div//text()')

并尝试以下操作以获得所需的输出

" ".join([i.strip() for i in tree.xpath('//div//text()') if i.strip()])