我是Python的新手,我搜索了很多,但我找不到解决方案。我想将以下xml文件解析为csv文件。
<List>
<item>
<id>5939c5e20d82880efce93933</id>
<sensorEvents>
<sensorEvents>
<avgSped>48.55647532226298</avgSped>
<completed>true</completed>
</sensorEvents>
<sensorEvents>
<avgSped>39.53368357145088</avgSped>
<completed>true</completed>
</sensorEvents>
<sensorEvents>
<avgSped>41.41160105233052</avgSped>
<completed>true</completed>
</sensorEvents>
</sensorEvents>
</item>
.
.
.
.
</List>
我写的代码是:
import xml.etree.ElementTree as ET
import csv
tree = ET.parse("my_xml_file.xml")
root = tree.getroot()
f = open('my_csv_file.csv', 'w')
csvwriter = csv.writer(f)
head = ['ID','avgSped','completed']
csvwriter.writerow(head)
for Item in root.findall('item'):
for Sensorevents in Item.findall('sensorEvents'):
row = []
id_ = Item.find('id').text
row.append(id_)
avgSped_ = Sensorevents.find('sensorEvents').find('avgSped').text
row.append(avgSped_)
completed_ = Sensorevents.find('sensorEvents').find('completed').text
row.append(completed_)
csvwriter.writerow(row)
f.close()
结果如下:
有3个sensorEvents但我的代码只捕获了第一个。如何修改代码以读取所有sensorEvents? 任何帮助都非常感谢。
答案 0 :(得分:2)
由于您的<sensorEvents>
标记包含3个<sensorEvents>
,因此第一个<sensorEvents>
会隐藏<sensorEvents>
中的孩子<sensorEvents>
。
这意味着
for Sensorevents in Item.findall('sensorEvents'):
每个
只会循环一次<sensorEvents>
<sensorEvents>
<avgSped>48.55647532226298</avgSped>
<completed>true</completed>
</sensorEvents>
<sensorEvents>
<avgSped>39.53368357145088</avgSped>
<completed>true</completed>
</sensorEvents>
<sensorEvents>
<avgSped>41.41160105233052</avgSped>
<completed>true</completed>
</sensorEvents>
</sensorEvents>
然后
avgSped_ = Sensorevents.find('sensorEvents').find('avgSped').text
row.append(avgSped_)
completed_ = Sensorevents.find('sensorEvents').find('completed').text
获取仅第一个标记的数据。
你应该试试
for Item in root.findall('item'):
for root_Sensorevents in Item.findall('sensorEvents'):
for Sensorevents in root_Sensorevents.findall('sensorEvents'):
...
答案 1 :(得分:0)
您还可以考虑使用lxml库,因为使用它可以通过xpath表达式进行搜索,这通常可以使代码更简单。
此处,xpath表达式.//sensorEvents/sensorEvents
表示在文档中的任何位置查找sensorEvents
元素,然后在下立即查找sensorEvents
元素。
一旦掌握了这些,为元素属性编写表达式通常很简单,如图所示。
>>> from lxml import etree
>>> tree = etree.parse('temp2.xml')
>>> inner_sensorEvents = tree.xpath('.//sensorEvents/sensorEvents')
>>> for inner_sensorEvent in inner_sensorEvents:
... inner_sensorEvent.find('avgSped').text, inner_sensorEvent.find('completed').text
...
('48.55647532226298', 'true')
('39.53368357145088', 'true')
('41.41160105233052', 'true')