Question

我似乎找不到回答这个问题的话题，所以我问自己由于这是一般性问题，可以将答案应用于大多数文档，我认为不需要特定的代码示例。

使用XPath我想选择所有不嵌套其他表的表节点所以没有其他的后代表元素，我也想丢弃所有只有空格的表。

我试过这个：

//table[not(child::table) and normalize-space(.)]

但它不起作用。

这样做的正确方法是什么？

感谢。

Answer 1

让我们使用以下HTML片段作为示例：

<div>
    <table id="1">

    </table>

    <table id="2">
        <table>
            <tr>
                <td>2</td>
            </tr>
        </table>
    </table>

    <table id="3">
        <div>I'm the one you wanted to find</div>
    </table>
</div>

根据你的描述，第一个table应该被丢弃，因为它只包含空格，第二个table也应该被丢弃，因为里面还有另一个table。

以下xpath表达式仅匹配第三个table：

/div/table[(not(child::table) and normalize-space(.))]

演示（使用xmllint工具）：

$ xmllint index.html --xpath '/div/table[(not(child::table) and normalize-space(.))]'
<table id="3">
    <div>I'm the one you wanted to find</div>
</table>

Answer 2

假设您正在抓取（X）HTML，并注意到table不能将另一个表作为直接子项，那么您可能正在寻找descendent个表元素，而不是直接{ {1}}元素。

child

在下面的Xml中：

table[not(descendant::table)]

xpath <xml> <table id="hasDescendent"> <tr> <td> <table id="Inner Descendent"/> </td> </tr> </table> <table id="directChild"> <table id="Inner Direct Child" /> </table> <table id="nochild"> </table> </xml>返回以下//table[not(descendant::table)] s：

Inner Descendent
内心直接儿童
nochild

XPath - 选择不包含元素的元素

2 个答案: