如何使用python将特定值从xml文件转换为csv文件?

时间:2019-05-07 10:09:04

标签: python xml-parsing

我正在尝试提取对象,每个对象标签的xmin,ymin,xmax和xmax值。

XML

<annotation>
    <folder>Plates_Number</folder>
    <filename>1.png</filename>
    <source>
        <database>Unknown</database>
    </source>
    <size>
        <width>294</width>
        <height>60</height>
        <depth>3</depth>
    </size>
    <segmented>0</segmented>
    <object>
        <name>2</name>
        <pose>Unspecified</pose>
        <truncated>1</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>40</xmin>
            <ymin>1</ymin>
            <xmax>69</xmax>
            <ymax>42</ymax>
        </bndbox>
    </object>
    <object>
        <name>10</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>67</xmin>
            <ymin>3</ymin>
            <xmax>101</xmax>
            <ymax>43</ymax>
        </bndbox>
    </object>
    <object>
        <name>1</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>122</xmin>
            <ymin>2</ymin>
            <xmax>153</xmax>
            <ymax>45</ymax>
        </bndbox>
    </object>
    <object>
        <name>10</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>151</xmin>
            <ymin>3</ymin>
            <xmax>183</xmax>
            <ymax>44</ymax>
        </bndbox>
    </object>
    <object>
        <name>2</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>186</xmin>
            <ymin>4</ymin>
            <xmax>216</xmax>
            <ymax>47</ymax>
        </bndbox>
    </object>
    <object>
        <name>5</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>214</xmin>
            <ymin>5</ymin>
            <xmax>245</xmax>
            <ymax>46</ymax>
        </bndbox>
    </object>
</annotation>

这是我尝试过的但没有得到预期的结果

python

import xml.etree.ElementTree as ET
import csv

tree = ET.parse("1.xml")
root = tree.getroot()

# open a file for writing

data = open('test.csv', 'r+')

# create the csv writer object

csvwriter = csv.writer(data)
data_head = []

count = 0
for member in root.findall('object'):
    obj = []
    bndbox_list = []
    if count == 0:
        name = member.find('name').tag
        data_head.append(name)
        bndbox = member[4].tag
        data_head.append(bndbox)
        csvwriter.writerow(data_head)
        count = count + 1

    name = member.find('name').text
    obj.append(name)
    bndbox = member[4][0].text
    bndbox_list.append(bndbox)
    xmin = member[4][1].text
    bndbox_list.append(xmin)
    ymin = member[4][2].text
    bndbox_list.append(ymin)
    xmax = member[4][3].text
    bndbox_list.append(xmax)
    ymax = member[4][4].text
    bndbox_list.append(ymax)
    obj.append(bndbox)
    csvwriter.writerow(data)
data.close()

我希望 名称xmin ymin xmax ymax 2 40 1 69 42 10 67 3 101 43 1 122 2 153 45 10151 3183 44 2186 4 216 47 5 214 5 245 46

但是我只得到这两个标题

名称bndbox

没有价值

2 个答案:

答案 0 :(得分:0)

代码:

import xml.etree.ElementTree as ET

root = ET.parse('file.xml').getroot()


for type_tag in root.findall('object'):
    name = type_tag.find('name').text
    xmin = type_tag.find('bndbox/xmin').text
    ymin = type_tag.find('bndbox/ymin').text
    xmax = type_tag.find('bndbox/xmax').text
    ymax = type_tag.find('bndbox/ymax').text

    print([name,xmin,ymin,xmax,ymax])

输出:

['2', '40', '1', '69', '42']
['10', '67', '3', '101', '43']
['1', '122', '2', '153', '45']
['10', '151', '3', '183', '44']
['2', '186', '4', '216', '47']
['5', '214', '5', '245', '46']

答案 1 :(得分:0)

如果可以使用BeautifulSoup,则可以使用

from bs4 import BeautifulSoup
soup = BeautifulSoup(input_xml_string)
tgs = soup.find_all('object', 'xml')
l = [(i.find('name').string, i.xmin.string, i.ymin.string, i.xmax.string, i.ymax.string) for i in tgs]

其中input_xml_string是字符串形式的输入xml。

soup将是BeautifulSoup对象,它是xml树的表示。

使用了一个XML parser

然后,find_all()函数用于在xml中查找所有<object>标记。结果存储在tgs中。

现在从tgs中的元素(它们是<object>的子标签)中,我们选择所需的标签,即Tag对象,并使用其{{ 1}}属性。

我们可以使用其string属性访问name中的值,但是stringname类的属性的名称。因此,我们首先使用 Tag获取find()的{​​{1}}子级,然后获取其内容。

现在,如果我们在<name>中打印值,

<object>

我们会得到

l