我目前正在尝试将数千个xml文件转换为csv,以便可以做一些更简单的数据工作。我正在尝试仅转换其中的一种,以便确保可以正常工作,然后循环播放。
当我在网上找到一个漂亮的教程时,我已经能够弄清楚其中的大部分内容。我的XML文件如下所示:
<?xml version="1.0" encoding="UTF-8"?>
<orbit id="14737">
<frame>
<time>2015-08-15T05:28:39.014</time>
<sza>113.48 deg</sza>
<alt>1552 km</alt>
<lat>-66.96 deg</lat>
<lon>196.11 deg</lon>
<x>-0.58 Rm</x>
<rho>1.33 Rm</rho>
<hperiod>0</hperiod>
<hperiodquality>0</hperiodquality>
<vperiod delaytime="167.443 μs">0</vperiod>
<vperiodquality>0</vperiodquality>
<cutoff>0</cutoff>
<ionospheretrace delaytime="167.443 μs"/>
<maxfreqquality>0</maxfreqquality>
<groundtrace delaytime="167.443 μs"/>
</frame>
...
这当然继续。
我的问题出现在电离层跟踪延迟时间之类的行上,该行不遵循XML文件的常规格式。
我的phython代码如下:
import xml.etree.ElementTree as ET
import csv
tree = ET.parse("14737.xml")
root = tree.getroot()
# open a file for writing
Orbit_data = open('/csv/14737', 'w')
# create the csv writer object
csvwriter = csv.writer(Orbit_data)
orbit_head = []
orbit_head.append('time')
orbit_head.append('sza')
orbit_head.append('alt')
orbit_head.append('lat')
orbit_head.append('lon')
orbit_head.append('x')
orbit_head.append('rho')
orbit_head.append('hperiod')
orbit_head.append('hperiodquality')
orbit_head.append('vperiod')
orbit_head.append('vperiodquality')
orbit_head.append('cutoff')
orbit_head.append('ionospheretrace delaytime')
orbit_head.append('maxfreqquality')
orbit_head.append('groundtrace delatytime')
csvwriter.writerow(orbit_head)
for member in root.findall('frame'):
frame = []
address_list = []
time = member.find('time').text
frame.append(time)
sza = member.find('sza').text
resident.append(sza)
alt = member.find('alt').text
resident.append(alt)
lat = member.find('lat').text
frame.append(lat)
lon = member.find('lon').text
frame.append(lon)
x = member.find('x').text
frame.append(x)
rho = member.find('rho').text
frame.append(rho)
hperiod = member.find('hperiod').text
frame.append(hperiod)
hperiodquality = member.find('hperiodquality').text
frame.append(hperiodquality)
vperiod = member.find('vperiod').text
frame.append(vperiod)
vperiodquality = member.find('vperiodquality').text
frame.append(vperiodquality)
cutoff = member.find('cutoff').text
frame.append(cutoff)
ionospheretrace_delaytime = member.find('ionopspheretrace delaytime').text
frame.append(ionospheretrace_delaytime)
maxfreqquality = member.find('maxfreqquality').text
frame.append(maxfreqquality)
groundtrace_delatytime = member.find('groundtrace delatytime').text
frame.append(groundtrace_delatytime)
csvwriter.writerow(frame)
Orbit_data.close()
我希望能够以某种方式存储延迟时间,但我不确定。
谢谢!
答案 0 :(得分:0)
下面是收集数据的通用方法。
想法是标记“特殊”标签(我们需要使用属性值的那些标签)
我跳过了csv生成,因为您的主要挑战是如何从xml中提取数据。
import xml.etree.ElementTree as ET
ATTRIBUTE_BASED_ELEMENTS = ['ionospheretrace', 'vperiod', 'groundtrace']
tree = ET.parse('56116141.xml')
root = tree.getroot()
data = []
for frame in root.findall('.//frame'):
one_frame = []
for child in list(frame):
if child.tag in ATTRIBUTE_BASED_ELEMENTS:
one_frame.append(child.attrib['delaytime'])
else:
one_frame.append(child.text)
data.append(one_frame)
for frame in data:
print(frame)
56116141.xml
<?xml version="1.0" encoding="UTF-8"?>
<orbit id="14737">
<frame>
<time>2015-08-15T05:28:39.014</time>
<sza>113.48 deg</sza>
<alt>1552 km</alt>
<lat>-66.96 deg</lat>
<lon>196.11 deg</lon>
<x>-0.58 Rm</x>
<rho>1.33 Rm</rho>
<hperiod>0</hperiod>
<hperiodquality>0</hperiodquality>
<vperiod delaytime="167.443 μs">0</vperiod>
<vperiodquality>0</vperiodquality>
<cutoff>0</cutoff>
<ionospheretrace delaytime="167.443 μs"/>
<maxfreqquality>0</maxfreqquality>
<groundtrace delaytime="167.443 μs"/>
</frame>
<frame>
<time>2016-08-15T05:28:39.014</time>
<sza>113.42 deg</sza>
<alt>1553 km</alt>
<lat>-66.16 deg</lat>
<lon>196.41 deg</lon>
<x>-0.56 Rm</x>
<rho>1.39 Rm</rho>
<hperiod>1</hperiod>
<hperiodquality>1</hperiodquality>
<vperiod delaytime="107.443 μs">0</vperiod>
<vperiodquality>1</vperiodquality>
<cutoff>1</cutoff>
<ionospheretrace delaytime="167.343 μs"/>
<maxfreqquality>1</maxfreqquality>
<groundtrace delaytime="967.443 μs"/>
</frame>
</orbit>
输出
['2015-08-15T05:28:39.014', '113.48 deg', '1552 km', '-66.96 deg', '196.11 deg', '-0.58 Rm', '1.33 Rm', '0', '0', '167.443 μs', '0', '0', '167.443 μs', '0', '167.443 μs']
['2016-08-15T05:28:39.014', '113.42 deg', '1553 km', '-66.16 deg', '196.41 deg', '-0.56 Rm', '1.39 Rm', '1', '1', '107.443 μs', '1', '1', '167.343 μs', '1', '967.443 μs']