用etree搜索整个树

时间:2016-01-01 21:08:52

标签: python xml

我使用的是xml.etree.ElementTree作为ET,这看起来像是首选的库,但如果有其他/更好的工作,我很感兴趣。

假设我有一棵树:

doc = """
<top>
<second>
<third>
    <subthird></subthird>
    <subthird2>
         <subsubthird>findme</subsubthird>
    </subthird2>
</third>
</second>
</top>"""

并且为了这个问题,让我们说这已经在一个名为myTree的元素中

我想将findme更新为found,除了迭代之外,还有一种简单的方法吗?

myTree.getroot().getchildren()[0].getchildren()[0].getchildren() \
    [1].getchildren()[0].text = 'found'

问题是我有一个大的xml树,我想更新这些值,我找不到一个明确和pythonic的方法来做到这一点。

2 个答案:

答案 0 :(得分:1)

您可以使用XPath expressions获取如下所示的特定标记名:

void ( *(*f[]) () ) ();        "f is"  
          ^  

void ( *(*f[]) () ) ();        "f is an array"  
           ^^ 

void ( *(*f[]) () ) ();        "f is an array of pointers" 
         ^    

void ( *(*f[]) () ) ();        "f is an array of pointers to function"   
               ^^     

void ( *(*f[]) () ) ();        "f is an array of pointers to function returning pointer"
       ^   

void ( *(*f[]) () ) ();        "f is an array of pointers to function returning pointer to function" 
                    ^^    

void ( *(*f[]) () ) ();        "f is an array of pointers to function returning pointer to function returning `void`"  
^^^^

如果您需要查找具有特定文字值的所有标记,请查看以下答案:Find element by text with XPath in ElementTree

答案 1 :(得分:0)

我将lxml与XPath表达式一起使用。 ElementTree有一个缩写的XPath语法,但由于我不使用它,我不知道它有多广泛。关于XPath的事情是你可以根据需要编写复杂的元素选择器。在这个例子中,它基于嵌套:

import lxml.etree 

doc = """
<top>
<second>
<third>
    <subthird></subthird>
    <subthird2>
         <subsubthird>findme</subsubthird>
    </subthird2>
</third>
</second>
</top>"""

root = lxml.etree.XML(doc)
for elem in root.xpath('second/third/subthird2/subsubthird'):
    elem.text = 'found'

print(lxml.etree.tostring(root, pretty_print=True, encoding='unicode'))

但是假设还有一些其他标识,例如唯一属性,

<subthird2 class="foo"><subsubthird>findme</subsubthird></subthird2>

那么xpath就是//subthird2[@class="foo"]/subsubthird