I have an xml
<root>
<node1>
<B>text</B>
<A>another_text</A>
<C>one_more_text</C>
</node1>
<node2>
<C>one_more_text</C>
<B>text</B>
<A>another_text</A>
</node2>
</root>
I want get output like:
<root>
<node1>
<A>another_text</A>
<B>text</B>
<C>one_more_text</C>
</node1>
<node2>
<A>another_text</A>
<B>text</B>
<C>one_more_text</C>
</node2>
</root>
I tried with some code like:
from xml.etree import ElementTree as et
tr = et.parse(path_in)
root = tr.getroot()
for children in root.getchildren():
for child in children.getchildren():
# sort it
tr.write(path_out)
I cannot use standard function sort
and sorted
because it sorted wrong way (not by tag).
Thanks in advance.
答案 0 :(得分:2)
From a similar question :
from lxml import etree
data = """<X>
<X03>3</X03>
<X02>2</X02>
<A>
<A02>Y</A02>
<A01>X</A01>
<A03>Z</A03>
</A>
<X01>1</X01>
<B>
<B01>Z</B01>
<B02>X</B02>
<B03>C</B03>
</B>
</X>"""
doc = etree.XML(data,etree.XMLParser(remove_blank_text=True))
for parent in doc.xpath('//*[./*]'): # Search for parent elements
parent[:] = sorted(parent,key=lambda x: x.tag)
print etree.tostring(doc,pretty_print=True)
result :
<X>
<A>
<A01>X</A01>
<A02>Y</A02>
<A03>Z</A03>
</A>
<B>
<B01>Z</B01>
<B02>X</B02>
<B03>C</B03>
</B>
<X01>1</X01>
<X02>2</X02>
<X03>3</X03>
</X>
You can find more information here : http://effbot.org/zone/element-sort.htm
答案 1 :(得分:2)
You need to:
tag
attribute (node's name)Sample working code:
from operator import attrgetter
from xml.etree import ElementTree as et
data = """ <root>
<node1>
<B>text</B>
<A>another_text</A>
<C>one_more_text</C>
</node1>
<node2>
<C>one_more_text</C>
<B>text</B>
<A>another_text</A>
</node2>
</root>"""
root = et.fromstring(data)
for node in root.findall("*"): # searching top-level nodes only: node1, node2 ...
node[:] = sorted(node, key=attrgetter("tag"))
print(et.tostring(root))
Prints:
<root>
<node1>
<A>another_text</A>
<B>text</B>
<C>one_more_text</C>
</node1>
<node2>
<A>another_text</A>
<B>text</B>
<C>one_more_text</C>
</node2>
</root>
Note that we are not using getchildren()
method here (it is actually deprecated since Python 2.7) - using the fact that each Element
instance is an iterable over the child nodes.