Python lxml:如何遍历备份树

时间:2015-11-20 18:11:57

标签: python xml xpath lxml

我有以下python代码

import lxml.etree

root = lxml.etree.parse("../../xml/test.xml")

path="./pages/page/paragraph[contains(text(),'ash')]"
para = root.xpath(path)

一旦我到达para节点,我不想再继续了。现在我想回到根目录并查看所有<paragraph>节点。有没有办法回到树上。

或者这样看待它。我想要rootpara之间的子树。我该怎么做?

供参考,这是xml

<document>
    <pages>
        <page>
            <paragraph>XBV</paragraph>
            <paragraph>GFH</paragraph>
        </page>
        <page>
            <paragraph>ash</paragraph>
            <paragraph>lplp</paragraph>
        </page>
    </pages>
</document>

现在在这种情况下,我想要节点XBV和GFH。怎么可能?

2 个答案:

答案 0 :(得分:2)

..会让你在树上一层。

但是,我认为preceding是您正在寻找的东西:

  

前一轴表示文档中上下文节点之前的所有节点,除了祖先,属性和命名空间节点。

./pages/page/paragraph[contains(text(),'ash')]//preceding::paragraph

示例代码:

import lxml.etree


data = """
<document>
    <pages>

    <page>
       <paragraph>XBV</paragraph>

       <paragraph>GFH</paragraph>
    </page>

    <page>
       <paragraph>ash</paragraph>

       <paragraph>lplp</paragraph>
    </page>

    </pages>
</document>
"""

tree = lxml.etree.fromstring(data)
print [item.text for item in tree.xpath("./pages/page/paragraph[contains(text(),'ash')]//preceding::paragraph")]

打印:

['XBV', 'GFH']

答案 1 :(得分:1)

上升并获取其中所有先前的page(仅页面)节点和paragraph节点并从中提取文本 -

>>>expresson = "./pages/page/paragraph[contains(text(),'ash')]//preceding::page//paragraph"
>>>x=  [i.text for i in expresson]
>>>['XBV', 'GFH']