Question

任何人都可以建议一个XPath表达式格式，它返回一个字符串值，其中包含元素的某些符合条件的子节点的连接值，但忽略其他子节点：

<div>
    This text node should be returned.
    <em>And the value of this element.</em>
    And this.
    <p>But this paragraph element should be ignored.</p>
</div>

返回的值应该是单个字符串：

This text node should be returned. And the value of this element. And this.

这在单个XPath表达式中是否可行？

感谢。

Answer 1

在XPath 2.0中：

<强> string-join(/*/node()[not(self::p)], '')

Answer 2

在XPath 1.0中：

您可以使用

/div//text()[not(parent::p)]

捕获有用的文本节点。连接本身不能在XPath 1.0中完成，我建议在宿主应用程序中完成。

Answer 3

这看起来有效：

使用上下文/div/：

text() | em/text()

或者不使用上下文：

/div/text() | /div/em/text()

如果要连接前两个字符串，请使用：

concat(/div/text(), /div/em/text())

Answer 4

/div//text()

双斜杠力以提取文本而不管中间节点

Answer 5

如果您想要除p以外的所有孩子，您可以尝试以下...

    string-join(//*[name() != 'p']/text(), "")

返回......

This text node should be returned.
And the value of this element.
And this.

Answer 6

我知道这来得有点晚，但是我认为我的答案可能仍然有意义。我最近遇到了类似的问题。而且由于我在不支持xpath 2.0的Python 3.6中使用了scrapy，所以我无法使用一些在线答案中建议的string-join函数。

我最终找到了一个简单的解决方法（如下所示），但是在任何stackoverflow答案中都没有看到它，这就是为什么我要共享它。

temp_selector_list = response.xpath('/div')
string_result = [''.join(x.xpath(".//text()").extract()) for x in temp_selector_list]

希望这会有所帮助！

Answer 7

您也可以使用for-each循环并将值组合在一个像

这样的变量中

<xsl:variable name="newstring">
    <xsl:for-each select="/div//text()">
      <xsl:value-of select="."/>
    </xsl:for-each>
  </xsl:variable>

XPath返回符合条件的子节点值的字符串连接

7 个答案: