我有一个像这样的xml文件:
<annotation>
<object>
<name>medium</name>
<pose>Left</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>267</xmin>
<ymin>273</ymin>
<xmax>415</xmax>
<ymax>324</ymax>
</bndbox>
</object>
<object>
<name>medium</name>
<pose>Left</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>105</xmin>
<ymin>229</ymin>
<xmax>261</xmax>
<ymax>292</ymax>
</bndbox>
</object>
</annotation>
我想使用ymin标记的文本按升序对xml进行排序。
我正在尝试使用以下代码,该代码抛出 'NoneType'对象不可迭代 。
def getkey(elem):
return elem.findtext("ymin")
tree = ET.parse("Train/5.xml")
container = tree.find("bndbox")
container[:] = sorted(container, key=getkey)
我希望第二个对象标签出现在最终结果中,而不是第一个对象标签。
我该如何实现?
答案 0 :(得分:0)
您可以使用iter
递归搜索'ymin',然后clear
现有对象并重新添加object
s的排序列表:
objects = sorted(tree.findall('object'),
key=lambda object_node: int(next(object_node.iter('ymin')).text))
tree.clear()
tree.extend(objects)
print(ET.tostring(tree))
输出
<annotation>
<object>
<name>medium</name>
<pose>Left</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>105</xmin>
<ymin>229</ymin>
<xmax>261</xmax>
<ymax>292</ymax>
</bndbox>
</object>
<object>
<name>medium</name>
<pose>Left</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>267</xmin>
<ymin>273</ymin>
<xmax>415</xmax>
<ymax>324</ymax>
</bndbox>
</object>
</annotation>
答案 1 :(得分:0)
使用BeautifulSoup
:
data = '''
<annotation>
<object>
<name>medium</name>
<pose>Left</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>267</xmin>
<ymin>273</ymin>
<xmax>415</xmax>
<ymax>324</ymax>
</bndbox>
</object>
<object>
<name>medium</name>
<pose>Left</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>105</xmin>
<ymin>229</ymin>
<xmax>261</xmax>
<ymax>292</ymax>
</bndbox>
</object>
</annotation>'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(data, 'xml')
soup.annotation.contents = sorted(soup.annotation.select('object'), key=lambda k: int(k.select_one('ymin').text))
# For XML pretty print, sorted XML is inside `soup`
from xml.dom import minidom
xmlstr = minidom.parseString(str(soup)).toprettyxml(indent=" ").replace('\n\n', '').strip()
print(xmlstr)
打印:
<?xml version="1.0" ?>
<annotation>
<object>
<name>medium</name>
<pose>Left</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>105</xmin>
<ymin>229</ymin>
<xmax>261</xmax>
<ymax>292</ymax>
</bndbox>
</object>
<object>
<name>medium</name>
<pose>Left</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>267</xmin>
<ymin>273</ymin>
<xmax>415</xmax>
<ymax>324</ymax>
</bndbox>
</object>
</annotation>
编辑(通过对XML文件的读/写):
from bs4 import BeautifulSoup
with open('file.xml', 'r') as f_in:
soup = BeautifulSoup(f_in.read(), 'xml')
soup.annotation.contents = sorted(soup.annotation.select('object'), key=lambda k: int(k.select_one('ymin').text))
# For XML pretty print, sorted XML is inside `soup`
from xml.dom import minidom
xmlstr = minidom.parseString(str(soup)).toprettyxml(indent=" ").replace('\n\n', '').strip()
with open('file_out.xml', 'w') as f_out:
f_out.write(xmlstr)