曾经尝试构建从XML到可读的熊猫数据帧的API响应。在此主题的较早主题中,我发现了很多启发,但是数据框中的值仍然显示为“无”。
XML响应:
<VehiclePositionResponse xmlns="http://fms-standard.com/rfms/v1.0.0/xsd/position" xmlns:ns2="http://fms-standard.com/rfms/v1.0.0/xsd/common/position">
<VehiclePosition>
<VIN>YS2R8X40005440923</VIN>
<TriggerType>OTHER</TriggerType>
<CreatedDateTime>2019-07-31T16:50:28</CreatedDateTime>
<ReceivedDateTime>2019-07-31T16:50:29</ReceivedDateTime>
<GNSSPosition>
<ns2:Latitude>62.098339</ns2:Latitude>
<ns2:Longitude>10.542222</ns2:Longitude>
<ns2:Heading>291</ns2:Heading>
<ns2:Altitude>655</ns2:Altitude>
<ns2:Speed>0</ns2:Speed>
<ns2:PositionDateTime>2019-07-31T16:50:28</ns2:PositionDateTime>
</GNSSPosition>
<WheelBasedSpeed></WheelBasedSpeed>
</VehiclePosition>
<VehiclePosition>
<VIN>YS2R8X40005441367</VIN>
<TriggerType>OTHER</TriggerType>
<CreatedDateTime>2019-07-31T18:13:24</CreatedDateTime>
<ReceivedDateTime>2019-07-31T18:13:25</ReceivedDateTime>
<GNSSPosition>
<ns2:Latitude>62.127206</ns2:Latitude>
<ns2:Longitude>10.608676</ns2:Longitude>
<ns2:Heading>3</ns2:Heading>
等
代码:
headers={'Authorization':Token,'Content-Type':'application/xml'}
r=requests.get(url, headers=headers)
def getvalueofnode(node):
return node.text if node is not None else None
def main():
root = cET.fromstring(r.content)
dfcols = ['VIN', 'CreatedDateTime', 'ReceivedDateTime', 'Latitude', 'Longitude', 'Altitude']
df_xml = pd.DataFrame(columns=dfcols)
for node in root:
VIN = node.find('VIN')
CreatedDateTime = node.find('CreatedDateTime')
ReceivedDateTime = node.find('ReceivedDateTime')
Latitude = node.find('Latitude')
Longitude = node.find('Longitude')
Altitude = node.find('Altitude')
df_xml = df_xml.append(
pd.Series([getvalueofnode(VIN), getvalueofnode(CreatedDateTime), getvalueofnode(ReceivedDateTime), getvalueofnode(Latitude), getvalueofnode(Longitude), getvalueofnode(Altitude)], index=dfcols),
ignore_index=True)
print(df_xml)
main()
答案 0 :(得分:1)
从本质上讲,您并没有考虑位于根标记中的XML中的名称空间,而可能没有考虑所有 None 结果的原因。考虑使用定义的名称空间进行解析。由于默认名称空间是一个,因此请给其任何前缀,例如 data 并对其进行解析:
ns = {"doc":"http://fms-standard.com/rfms/v1.0.0/xsd/position",
"ns2":"http://fms-standard.com/rfms/v1.0.0/xsd/common/position"}
for node in root:
VIN = node.find("doc:VIN", ns)
CreatedDateTime = node.find('doc:CreatedDateTime', ns)
ReceivedDateTime = node.find('doc:ReceivedDateTime', ns)
Latitude = node.find('doc:GNSSPosition/ns2:Latitude', ns)
Longitude = node.find('doc:GNSSPosition/ns2:Longitude', ns)
Altitude = node.find('doc:GNSSPosition/ns2:Altitude', ns)
此外,避免quadratic copy循环调用append
。而是建立字典列表以绑定到DataFrame()
构造函数中。
def main2():
root = cET.fromstring(r.content)
ns = {"doc":"http://fms-standard.com/rfms/v1.0.0/xsd/position",
"ns2":"http://fms-standard.com/rfms/v1.0.0/xsd/common/position"}
data_list = [{'VIN': getvalueofnode(node.find("doc:VIN", ns)),
'CreatedDateTime': getvalueofnode(node.find('doc:CreatedDateTime', ns)),
'ReceivedDateTime': getvalueofnode(node.find('doc:ReceivedDateTime', ns)),
'Latitude': getvalueofnode(node.find('doc:GNSSPosition/ns2:Latitude', ns)),
'Longitude': getvalueofnode(node.find('doc:GNSSPosition/ns2:Longitude', ns)),
'Altitude': getvalueofnode(node.find('doc:GNSSPosition/ns2:Altitude', ns))} \
for node in root]
df_xml = pd.DataFrame(data_list)
输出
print(df_xml)
# Altitude CreatedDateTime Latitude Longitude ReceivedDateTime VIN
# 0 655 2019-07-31T16:50:28 62.098339 10.542222 2019-07-31T16:50:29 YS2R8X40005440923
# 1 None 2019-07-31T18:13:24 62.127206 10.608676 2019-07-31T18:13:25 YS2R8X40005441367