Sort xml with python by tag

时间:2016-10-19 13:34:24

标签: python xml sorting

I have an xml

<root>
 <node1>
  <B>text</B>
  <A>another_text</A>
  <C>one_more_text</C>
 </node1>
 <node2>
  <C>one_more_text</C>
  <B>text</B>
  <A>another_text</A>
 </node2>
</root>

I want get output like:

<root>
 <node1>
  <A>another_text</A>
  <B>text</B>
  <C>one_more_text</C>
 </node1>
 <node2>
  <A>another_text</A>
  <B>text</B>
  <C>one_more_text</C>
 </node2>
</root>

I tried with some code like:

from xml.etree import ElementTree as et

tr = et.parse(path_in)
root = tr.getroot()
for children in root.getchildren():
    for child in children.getchildren():
        # sort it

tr.write(path_out)        

I cannot use standard function sort and sorted because it sorted wrong way (not by tag). Thanks in advance.

2 个答案:

答案 0 :(得分:2)

From a similar question :

from lxml import etree

data = """<X>
    <X03>3</X03>
    <X02>2</X02>
    <A>
        <A02>Y</A02>
        <A01>X</A01>
        <A03>Z</A03>
    </A>
    <X01>1</X01>
    <B>
        <B01>Z</B01>
        <B02>X</B02>
        <B03>C</B03>
    </B>
</X>"""

doc = etree.XML(data,etree.XMLParser(remove_blank_text=True))

for parent in doc.xpath('//*[./*]'): # Search for parent elements
  parent[:] = sorted(parent,key=lambda x: x.tag)

print etree.tostring(doc,pretty_print=True)

result :

<X>
  <A>
    <A01>X</A01>
    <A02>Y</A02>
    <A03>Z</A03>
  </A>
  <B>
    <B01>Z</B01>
    <B02>X</B02>
    <B03>C</B03>
  </B>
  <X01>1</X01>
  <X02>2</X02>
  <X03>3</X03>
</X>

You can find more information here : http://effbot.org/zone/element-sort.htm

答案 1 :(得分:2)

You need to:

  • get the children elements for every top-level "node"
  • sort them by the tag attribute (node's name)
  • reset the child nodes of each top-level node

Sample working code:

from operator import attrgetter
from xml.etree import ElementTree as et

data = """  <root>
 <node1>
  <B>text</B>
  <A>another_text</A>
  <C>one_more_text</C>
 </node1>
 <node2>
  <C>one_more_text</C>
  <B>text</B>
  <A>another_text</A>
 </node2>
</root>"""


root = et.fromstring(data)
for node in root.findall("*"):  # searching top-level nodes only: node1, node2 ...
    node[:] = sorted(node, key=attrgetter("tag"))

print(et.tostring(root))

Prints:

<root>
 <node1>
  <A>another_text</A>
  <B>text</B>
  <C>one_more_text</C>
 </node1>
 <node2>
  <A>another_text</A>
  <B>text</B>
  <C>one_more_text</C>
  </node2>
</root>

Note that we are not using getchildren() method here (it is actually deprecated since Python 2.7) - using the fact that each Element instance is an iterable over the child nodes.