如何将嵌套的xml(具有相同的childs名称)解析为CSV?

时间:2017-08-02 22:53:46

标签: python xml python-3.x csv parsing

我是Python的新手,我搜索了很多,但我找不到解决方案。我想将以下xml文件解析为csv文件。

<List>
  <item>
     <id>5939c5e20d82880efce93933</id>
     <sensorEvents>
        <sensorEvents>
            <avgSped>48.55647532226298</avgSped>
            <completed>true</completed>
        </sensorEvents>
        <sensorEvents>
            <avgSped>39.53368357145088</avgSped>
            <completed>true</completed>
        </sensorEvents>
        <sensorEvents>
            <avgSped>41.41160105233052</avgSped>
            <completed>true</completed>
        </sensorEvents>
     </sensorEvents>
  </item>

  .
  .
  .
  .

</List>

我写的代码是:

import xml.etree.ElementTree as ET
import csv
tree = ET.parse("my_xml_file.xml")
root = tree.getroot()
f = open('my_csv_file.csv', 'w')
csvwriter = csv.writer(f)

head = ['ID','avgSped','completed']
csvwriter.writerow(head)

for Item in root.findall('item'):

    for Sensorevents in Item.findall('sensorEvents'):


        row = []
        id_ = Item.find('id').text
        row.append(id_)

        avgSped_ = Sensorevents.find('sensorEvents').find('avgSped').text
        row.append(avgSped_)

        completed_ = Sensorevents.find('sensorEvents').find('completed').text
        row.append(completed_)

        csvwriter.writerow(row)


f.close()

结果如下:

enter image description here

有3个sensorEvents但我的代码只捕获了第一个。如何修改代码以读取所有sensorEvents? 任何帮助都非常感谢。

2 个答案:

答案 0 :(得分:2)

由于您的<sensorEvents>标记包含3个<sensorEvents>,因此第一个<sensorEvents>会隐藏<sensorEvents>中的孩子<sensorEvents>

这意味着

    for Sensorevents in Item.findall('sensorEvents'):

每个

只会循环一次
<sensorEvents>
    <sensorEvents>
        <avgSped>48.55647532226298</avgSped>
        <completed>true</completed>
    </sensorEvents>
    <sensorEvents>
        <avgSped>39.53368357145088</avgSped>
        <completed>true</completed>
    </sensorEvents>
    <sensorEvents>
        <avgSped>41.41160105233052</avgSped>
        <completed>true</completed>
    </sensorEvents>
</sensorEvents>

然后

    avgSped_ = Sensorevents.find('sensorEvents').find('avgSped').text
    row.append(avgSped_)

    completed_ = Sensorevents.find('sensorEvents').find('completed').text

获取仅第一个标记的数据。

你应该试试

for Item in root.findall('item'):
    for root_Sensorevents in Item.findall('sensorEvents'):
        for Sensorevents in root_Sensorevents.findall('sensorEvents'):
...

答案 1 :(得分:0)

您还可以考虑使用lxml库,因为使用它可以通过xpath表达式进行搜索,这通常可以使代码更简单。

此处,xpath表达式.//sensorEvents/sensorEvents表示在文档中的任何位置查找sensorEvents元素,然后在下立即查找sensorEvents元素

一旦掌握了这些,为元素属性编写表达式通常很简单,如图所示。

>>> from lxml import etree
>>> tree = etree.parse('temp2.xml')
>>> inner_sensorEvents = tree.xpath('.//sensorEvents/sensorEvents')
>>> for inner_sensorEvent in inner_sensorEvents:
...     inner_sensorEvent.find('avgSped').text, inner_sensorEvent.find('completed').text
... 
('48.55647532226298', 'true')
('39.53368357145088', 'true')
('41.41160105233052', 'true')