在XML文件中查找元素

时间:2019-06-21 08:22:31

标签: python xml

我有以下XML文件:

<annotation>
  <folder>KAIST Multispectral Ped Benchmark</folder>
  <filename>set00/V003/I00397</filename>
  <source>
    <database>KAIST pedestrian</database>
    <annotation>KAIST pedestrian</annotation>
    <image>KAIST pedestrian</image>
    <url>https://soonminhwang.github.io/rgbt-ped-detection/</url>
    <note>Sanitized training annotation [BMVC18] (https://li-chengyang.github.io/home/MSDS-RCNN/)</note>
  </source>
  <size>
    <width>640</width>
    <height>512</height>
    <depth>4</depth>
  </size>
  <segmented>0</segmented>
  <object>
    <name>person</name>
    <bndbox>
      <x>457</x>
      <y>217</y>
      <w>31</w>
      <h>78</h>
    </bndbox>
    <pose>unknown</pose>
    <truncated>0</truncated>
    <difficult>0</difficult>
    <occlusion>0</occlusion>
  </object>
  <object>
    <name>person</name>
    <bndbox>
      <x>486</x>
      <y>217</y>
      <w>29</w>
      <h>78</h>
    </bndbox>
    <pose>unknown</pose>
    <truncated>0</truncated>
    <difficult>0</difficult>
    <occlusion>0</occlusion>
  </object>
  <object>
    <name>people</name>
    <bndbox>
      <x>420</x>
      <y>226</y>
      <w>26</w>
      <h>41</h>
    </bndbox>
    <pose>unknown</pose>
    <truncated>0</truncated>
    <difficult>0</difficult>
    <occlusion>0</occlusion>
  </object>
</annotation>

我想从文件中提取某些元素。例如,在对象下,有三个名称“人”,“人”和“人”。我已经使用以下方法提取“ bndbox”值:

box = {e.tag: int(e.text) for e in root.findall('.//bndbox/*')}

输出:

{'x': 420, 'y': 226, 'w': 26, 'h': 41}

但是当我使用相同的方法来查找“名称”时,会得到以下输出:

label = {e.tag: e.text for e in root.findall('.//name')}
{'name': 'people'}

这似乎只是输出最终值。

任何建议将不胜感激。

2 个答案:

答案 0 :(得分:0)

尝试

[name.text for name in root.findall('object/name')]

答案 1 :(得分:0)

此处(基于etree的工作代码)

import xml.etree.ElementTree as ET

xml = '''<annotation>
  <folder>KAIST Multispectral Ped Benchmark</folder>
  <filename>set00/V003/I00397</filename>
  <source>
    <database>KAIST pedestrian</database>
    <annotation>KAIST pedestrian</annotation>
    <image>KAIST pedestrian</image>
    <url>https://soonminhwang.github.io/rgbt-ped-detection/</url>
    <note>Sanitized training annotation [BMVC18] (https://li-chengyang.github.io/home/MSDS-RCNN/)</note>
  </source>
  <size>
    <width>640</width>
    <height>512</height>
    <depth>4</depth>
  </size>
  <segmented>0</segmented>
  <object>
    <name>person</name>
    <bndbox>
      <x>457</x>
      <y>217</y>
      <w>31</w>
      <h>78</h>
    </bndbox>
    <pose>unknown</pose>
    <truncated>0</truncated>
    <difficult>0</difficult>
    <occlusion>0</occlusion>
  </object>
  <object>
    <name>person</name>
    <bndbox>
      <x>486</x>
      <y>217</y>
      <w>29</w>
      <h>78</h>
    </bndbox>
    <pose>unknown</pose>
    <truncated>0</truncated>
    <difficult>0</difficult>
    <occlusion>0</occlusion>
  </object>
  <object>
    <name>people</name>
    <bndbox>
      <x>420</x>
      <y>226</y>
      <w>26</w>
      <h>41</h>
    </bndbox>
    <pose>unknown</pose>
    <truncated>0</truncated>
    <difficult>0</difficult>
    <occlusion>0</occlusion>
  </object>
</annotation>'''

root = ET.fromstring(xml)
names = [n.text for n in root.findall('.//object/name')]
print(names)
boxes = [[box.find('x').text, box.find('y').text, box.find('w').text, 
          box.find('h').text] for box in
          root.findall('.//object/bndbox')]
print(boxes)

输出

['person', 'person', 'people']
[['457', '217', '31', '78'], ['486', '217', '29', '78'], ['420', '226', '26', '41']]