使用Python按标签文本对xml文件进行排序

时间:2019-07-08 10:48:46

标签: python xml sorting

我有一个像这样的xml文件:

<annotation>

        <object>
        <name>medium</name>
        <pose>Left</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>267</xmin>
            <ymin>273</ymin>
            <xmax>415</xmax>
            <ymax>324</ymax>
        </bndbox>
    </object>
    <object>
        <name>medium</name>
        <pose>Left</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>105</xmin>
            <ymin>229</ymin>
            <xmax>261</xmax>
            <ymax>292</ymax>
        </bndbox>
    </object>

</annotation>

我想使用ymin标记的文本按升序对xml进行排序。
我正在尝试使用以下代码,该代码抛出 'NoneType'对象不可迭代

def getkey(elem):
    return elem.findtext("ymin")

tree = ET.parse("Train/5.xml")
container = tree.find("bndbox")           
container[:] = sorted(container, key=getkey)

我希望第二个对象标签出现在最终结果中,而不是第一个对象标签。
我该如何实现?

2 个答案:

答案 0 :(得分:0)

您可以使用iter递归搜索'ymin',然后clear现有对象并重新添加object s的排序列表:

objects = sorted(tree.findall('object'),
                 key=lambda object_node: int(next(object_node.iter('ymin')).text))

tree.clear()
tree.extend(objects)

print(ET.tostring(tree))

输出

<annotation>
    <object>
        <name>medium</name>
        <pose>Left</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>105</xmin>
            <ymin>229</ymin>
            <xmax>261</xmax>
            <ymax>292</ymax>
        </bndbox>
    </object>
    <object>
        <name>medium</name>
        <pose>Left</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>267</xmin>
            <ymin>273</ymin>
            <xmax>415</xmax>
            <ymax>324</ymax>
        </bndbox>
    </object>
</annotation>

答案 1 :(得分:0)

使用BeautifulSoup

data = '''

<annotation>

        <object>
        <name>medium</name>
        <pose>Left</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>267</xmin>
            <ymin>273</ymin>
            <xmax>415</xmax>
            <ymax>324</ymax>
        </bndbox>
    </object>
    <object>
        <name>medium</name>
        <pose>Left</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>105</xmin>
            <ymin>229</ymin>
            <xmax>261</xmax>
            <ymax>292</ymax>
        </bndbox>
    </object>

</annotation>'''

from bs4 import BeautifulSoup
soup = BeautifulSoup(data, 'xml')
soup.annotation.contents = sorted(soup.annotation.select('object'), key=lambda k: int(k.select_one('ymin').text))


# For XML pretty print, sorted XML is inside `soup`
from xml.dom import minidom
xmlstr = minidom.parseString(str(soup)).toprettyxml(indent="  ").replace('\n\n', '').strip()
print(xmlstr)

打印:

<?xml version="1.0" ?>
<annotation>
  <object>
        <name>medium</name>
        <pose>Left</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>105</xmin>
            <ymin>229</ymin>
            <xmax>261</xmax>
            <ymax>292</ymax>
          </bndbox>
      </object>
  <object>
        <name>medium</name>
        <pose>Left</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>267</xmin>
            <ymin>273</ymin>
            <xmax>415</xmax>
            <ymax>324</ymax>
          </bndbox>
      </object>
</annotation>

编辑(通过对XML文件的读/写):

from bs4 import BeautifulSoup

with open('file.xml', 'r') as f_in:
    soup = BeautifulSoup(f_in.read(), 'xml')

soup.annotation.contents = sorted(soup.annotation.select('object'), key=lambda k: int(k.select_one('ymin').text))


# For XML pretty print, sorted XML is inside `soup`
from xml.dom import minidom
xmlstr = minidom.parseString(str(soup)).toprettyxml(indent="  ").replace('\n\n', '').strip()

with open('file_out.xml', 'w') as f_out:
    f_out.write(xmlstr)