我想将xml文件的某些子元素合并在一起。以下是我的格式:
<?xml version='1.0' encoding='ISO-8859-1'?><?xml-stylesheet type='text/xsl' href='image_metadata_stylesheet.xsl'?><dataset><name>imglab dataset</name><comment>Created by imglab tool.</comment><images>
<image file='/home/user126043/Documents/testimages/9935.jpg'>
<box top='329' left='510' width='385' height='534'>
<label>Pirelli
</label></box></image>
<image file='/home/user126043/Documents/testimages/9935.jpg'>
<box top='360' left='113' width='440' height='147'>
<label>Pirelli
</label></box></image>
<image file='/home/user126043/Documents/testimages/9921.jpg'>
<box top='329' left='510' width='385' height='534'>
<label>Pirelli
</label></image>
</images></dataset>
在上面的xml中,我有两次指定的图像99.jpg的盒子坐标,我要合并为一个。我想删除针对同一图像重复显示的<image>
标记,并希望合并其自己的图像标记中每个图像的所有框坐标。我从未使用过XML,因此我不确定我使用的定义是否正确。所需的输出是:
<?xml version='1.0' encoding='ISO-8859-1'?><?xml-stylesheet type='text/xsl' href='image_metadata_stylesheet.xsl'?><dataset><name>imglab dataset</name><comment>Created by imglab tool.</comment><images>
<image file='/home/user126043/Documents/testimages/9935.jpg'>
<box top='329' left='510' width='385' height='534'>
<label>Pirelli
</label></box>
<box top='360' left='113' width='440' height='147'>
<label>Pirelli
</label></box></image>
<image file='/home/user126043/Documents/testimages/9921.jpg'>
<box top='329' left='510' width='385' height='534'>
<label>Pirelli
</label></image>
</images></dataset>
答案 0 :(得分:2)
您可以尝试使用模块xml.etree.ElementTree:
import xml.etree.ElementTree as ET
tree = ET.parse('dataset.xml')
root = tree.getroot()
file_dict = dict()
for image in root.iter('image'):
file_str = image.get('file')
if file_str in file_dict:
root.find('images').remove(image) #remove the duplicate one
root.find('images').find("./image[@file='"+file_str+"']").append(image.find('box')) #append duplicated subelement to merge with same image element
else:
file_dict[file_str]=image
print(ET.tostring(root))
新的root
将是:
<dataset><images>
<image file="/home/user126043/Documents/testimages/9941.jpg">
<box height="147" left="113" top="360" width="440">
<label>Pirelli
</label></box></image>
<image file="/home/user126043/Documents/testimages/99.jpg">
<box height="276" left="247" top="160" width="228">
<label>Pirelli
</label></box><box height="276" left="247" top="439" width="506">
<label>Pirelli
</label></box></image>
</images></dataset>