我有一个这样的xml例子:
<annotation>
<folder>Test</folder>
<filename>10 2019-02-06_20-32.png</filename>
<source>
<database>undefined</database>
</source>
<size>
<width>768</width>
<height>574</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>low</name>
<pose>Left</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>360</xmin>
<ymin>38</ymin>
<xmax>434</xmax>
<ymax>113</ymax>
</bndbox>
</object>
<object>
<name>medium</name>
<pose>Left</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>227</xmin>
<ymin>128</ymin>
<xmax>290</xmax>
<ymax>200</ymax>
</bndbox>
</object>
<object>
<name>low</name>
<pose>Left</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox> //duplicate
<xmin>360</xmin>
<ymin>38</ymin>
<xmax>434</xmax>
<ymax>113</ymax>
</bndbox>
</object>
<object>
<name>medium</name>
<pose>Left</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox> //duplicate
<xmin>227</xmin>
<ymin>128</ymin>
<xmax>290</xmax>
<ymax>200</ymax>
</bndbox>
</object>
在此示例中,您可以看到两个元素重复,如何删除与重复元素相对应的所有“对象”元素?
如何检测是否存在重复项? 找到它的那一刻,如何删除它?
感谢您的回答。
答案 0 :(得分:0)
您可以使用BeautifulSoup和ByteArrayInputStream
方法:
extract()
打印:
data = '''<annotation>
<folder>Test</folder>
<filename>10 2019-02-06_20-32.png</filename>
<source>
<database>undefined</database>
</source>
<size>
<width>768</width>
<height>574</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>low</name>
<pose>Left</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>360</xmin>
<ymin>38</ymin>
<xmax>434</xmax>
<ymax>113</ymax>
</bndbox>
</object>
<object>
<name>medium</name>
<pose>Left</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>227</xmin>
<ymin>128</ymin>
<xmax>290</xmax>
<ymax>200</ymax>
</bndbox>
</object>
<object>
<name>low</name>
<pose>Left</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>360</xmin>
<ymin>38</ymin>
<xmax>434</xmax>
<ymax>113</ymax>
</bndbox>
</object>
<object>
<name>medium</name>
<pose>Left</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>227</xmin>
<ymin>128</ymin>
<xmax>290</xmax>
<ymax>200</ymax>
</bndbox>
</object>
</annotation>'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(data, 'html.parser')
seen = set()
for obj in soup.select('object'):
if obj not in seen:
seen.add(obj)
continue
obj.extract()
print(soup)