使用Python更改.xml中的特定重复元素

时间:2012-07-26 19:28:32

标签: python xml parsing lxml

我有以下.xml文件,我喜欢操作:

<html>
  <A>
    <B>
      <C>
        <D>
          <TYPE>
            <NUMBER>7297</NUMBER>
            <DATA />
          </TYPE>
          <TYPE>
            <NUMBER>7721</NUMBER>
            <DATA>A=1,B=2,C=3,</DATA>
          </TYPE>
        </D>
      </C>
    </B>
  </A>
</html>

我想更改位于<DATA>元素下的<NUMBER>7721</NUMBER>内的文字。我怎么做?如果我使用find()findtext(),则只能指向第一个匹配。

1 个答案:

答案 0 :(得分:3)

XPath非常适合这种东西。 //TYPE[NUMBER='7721' and DATA]将找到所有TYPE节点,这些节点至少有一个NUMBER子节点,文本为“7721”,并且至少有一个DATA子节点:

from lxml import etree

xmlstr = """<html>
  <A>
    <B>
      <C>
        <D>
          <TYPE>
            <NUMBER>7297</NUMBER>
            <DATA />
          </TYPE>
          <TYPE>
            <NUMBER>7721</NUMBER>
            <DATA>A=1,B=2,C=3,</DATA>
          </TYPE>
        </D>
      </C>
    </B>
  </A>
</html>"""

html_element = etree.fromstring(xmlstr)

# find all the TYPE nodes that have NUMBER=7721 and DATA nodes
type_nodes = html_element.xpath("//TYPE[NUMBER='7721' and DATA]")

# the for loop is probably superfluous, but who knows, there might be more than one!
for t in type_nodes:
    d = t.find('DATA')
    # example: append spamandeggs to the end of the data text
    if d.text is None:
        d.text = 'spamandeggs'
    else:
        d.text += 'spamandeggs'
print etree.tostring(html_element)

输出:

<html>
  <A>
    <B>
      <C>
        <D>
          <TYPE>
            <NUMBER>7297</NUMBER>
            <DATA/>
          </TYPE>
          <TYPE>
            <NUMBER>7721</NUMBER>
            <DATA>A=1,B=2,C=3,spamandeggs</DATA>
          </TYPE>
        </D>
      </C>
    </B>
  </A>
</html>