Question

我有以下xml：

<test1>
    <test2>
       <text>This is a question on xpath
       </text>
    </test2>
    <test3>
        <test2>
            <text>Do not extract this
             </text>
        </test2>
    </test3>
</test1>

我需要在test2/text中提取文字，但如果test2进入test3，则不需要。{}怎么能在xpath中完成？我尝试使用findall之类的东西：

for p in lxml_tree.xpath('.//test2',namespaces={'w':w}):
    for q in p.iterancestors():
        if q.tag=="test3":
           break
        else:
            text+= ''.join(t.text for t in p.xpath('.//text'))

但这不起作用。我猜xpath在单个表达式中有更好的方法来排除它。

预期产出：

text = "This is a question on xpath"

Answer 1

假设comes inside表示任何级别的父级，您可以使用not和ancestor axis来检查节点是否没有特定的父级/祖先：

//test2[not(ancestor::test3)]/text

但是，如果您认为immediate parent不应该是test3，那么请切换ancestor parent：

//test2[not(parent::test3)]/text

在xpath中没有条件

1 个答案: