Question

请注意：此问题是previous question的更精确版本。

我正在寻找一个XPath，它允许我在HTML文档中找到具有给定纯文本的元素。例如，假设我有以下HTML：

<html>
<head>...</head>
<body>
    <someElement>This can be found</someElement>
    <nested>
        <someOtherElement>This can <em>not</em> be found most nested</someOtherElement>
    </nested>
    <yetAnotherElement>This can <em>not</em> be found</yetAnotherElement>
</body>
</html>

我需要按文字搜索，并且能够使用以下XPath找到<someElement>：

//*[contains(text(), 'This can be found')]

我正在寻找一个类似的XPath，可让我使用 plain 文本<someOtherElement>找到<yetAnotherElement>和"This can not be found"。以下不起作用：

//*[contains(text(), 'This can not be found')]

我理解这是因为嵌套的em元素会“破坏”“无法找到这个”的文本流。在某种程度上，是否可以通过XPath忽略上述类似或类似的嵌套？

Answer 1

您可以使用

//*[contains(., 'This can not be found')]
   [not(.//*[contains(., 'This can not be found')])]

这个XPath由两部分组成：

//*[contains(., 'This can not be found')]：运算符.将上下文节点转换为其字符串表示形式。因此，此部分选择其字符串表示中包含“无法找到”的所有节点。在上面的示例中，这是<someOtherElement>，<yetAnotherElement> 和： <body>和<html>。
[not(.//*[contains(., 'This can not be found')])]：这将删除包含子元素的节点，该子元素仍包含纯文本“无法找到”。它会删除上例中不需要的节点<body>和<html>。

您可以尝试使用这些XPath here。

XPath：按纯文本查找HTML元素

1 个答案: