Question

response.xpath('//*[@id="blah"]//text()')

假设我的html是

<p id="blah">This is a simple text <a href="#">foo</a> and this is after tag. </p>

发生了什么我得到了一个文本列表，即使它有一个<p>标记。如

[u'This is a simple text', u' and this is after tag.']

我的实际html内容很大，我必须join才能实现这一目标。在foo时我也会失去join。这样做有什么特定的xpath scrapy机制吗？

我想得到结果 这是一个简单的文字foo，这是在标记之后。

请注意foo。

由于

Answer 1

如果它是xpath 2，你可以使用字符串连接函数

response.xpath('string-join(//*[@id="blah"]//text())')

Answer 2

您可以将所有文本节点作为单个字符串获取，如下所示：

response.xpath('//*[@id="blah"]')[0].text_content()

输出：

'This is a simple text foo and this is after tag. '