如何删除xml标记中所有重复的元素[python]

时间:2019-07-12 15:31:20

标签: python xml duplicates

我有一个这样的xml例子:

<annotation>
<folder>Test</folder>
<filename>10 2019-02-06_20-32.png</filename>
<source>
    <database>undefined</database>
</source>
<size>
    <width>768</width>
    <height>574</height>
    <depth>3</depth>
</size>
<segmented>0</segmented>
<object>
    <name>low</name>
    <pose>Left</pose>
    <truncated>0</truncated>
    <difficult>0</difficult>
    <bndbox>
        <xmin>360</xmin>
        <ymin>38</ymin>
        <xmax>434</xmax>
        <ymax>113</ymax>
    </bndbox>
</object>
<object>
    <name>medium</name>
    <pose>Left</pose>
    <truncated>0</truncated>
    <difficult>0</difficult>
    <bndbox>
        <xmin>227</xmin>
        <ymin>128</ymin>
        <xmax>290</xmax>
        <ymax>200</ymax>
    </bndbox>
</object>
<object>
    <name>low</name>
    <pose>Left</pose>
    <truncated>0</truncated>
    <difficult>0</difficult>
    <bndbox>                 //duplicate
        <xmin>360</xmin>
        <ymin>38</ymin>
        <xmax>434</xmax>
        <ymax>113</ymax>
    </bndbox>
</object>
<object>
    <name>medium</name>
    <pose>Left</pose>
    <truncated>0</truncated>
    <difficult>0</difficult>
    <bndbox>                     //duplicate
        <xmin>227</xmin>
        <ymin>128</ymin>
        <xmax>290</xmax>
        <ymax>200</ymax>
    </bndbox>
</object>

在此示例中,您可以看到两个元素重复,如何删除与重复元素相对应的所有“对象”元素?

如何检测是否存在重复项? 找到它的那一刻,如何删除它?

感谢您的回答。

1 个答案:

答案 0 :(得分:0)

您可以使用BeautifulSoup和ByteArrayInputStream方法:

extract()

打印:

data = '''<annotation>
<folder>Test</folder>
<filename>10 2019-02-06_20-32.png</filename>
<source>
    <database>undefined</database>
</source>
<size>
    <width>768</width>
    <height>574</height>
    <depth>3</depth>
</size>
<segmented>0</segmented>
<object>
    <name>low</name>
    <pose>Left</pose>
    <truncated>0</truncated>
    <difficult>0</difficult>
    <bndbox>
        <xmin>360</xmin>
        <ymin>38</ymin>
        <xmax>434</xmax>
        <ymax>113</ymax>
    </bndbox>
</object>
<object>
    <name>medium</name>
    <pose>Left</pose>
    <truncated>0</truncated>
    <difficult>0</difficult>
    <bndbox>
        <xmin>227</xmin>
        <ymin>128</ymin>
        <xmax>290</xmax>
        <ymax>200</ymax>
    </bndbox>
</object>
<object>
    <name>low</name>
    <pose>Left</pose>
    <truncated>0</truncated>
    <difficult>0</difficult>
    <bndbox>
        <xmin>360</xmin>
        <ymin>38</ymin>
        <xmax>434</xmax>
        <ymax>113</ymax>
    </bndbox>
</object>
<object>
    <name>medium</name>
    <pose>Left</pose>
    <truncated>0</truncated>
    <difficult>0</difficult>
    <bndbox>
        <xmin>227</xmin>
        <ymin>128</ymin>
        <xmax>290</xmax>
        <ymax>200</ymax>
    </bndbox>
</object>
</annotation>'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(data, 'html.parser')

seen = set()
for obj in soup.select('object'):
    if obj not in seen:
        seen.add(obj)
        continue
    obj.extract()

print(soup)