我正在尝试了解如何使用Python从XML文件中提取某些数据。
目前我从API获取信息并获取XML文件,但我希望直接从XML中获取特定信息。
从我能看到的情况来看,似乎Element Tree就是答案,但我发现它很难理解,我真的不确定这是创建解决方案的正确方法。
我已经离开了我用于获取XML数据的代码,以及它给我的缩短的XML文件(只留下了我需要提取的重要部分)。
谢谢。
import requests
#Import routes
routes=[]
class routesClass:
def __init__(self,name,url):#,start,end,offset,rwe,al):
self.n=name
self.u=url
#self.s=start
#self.e=end
#self.o=offset
#self.r=rwe
#self.a=al
#Add example route
testRoute1=routesClass("EasternFwy-Hoddle/Johnston","https://api.tomtom.com/routing/1/calculateRoute/-37.79205923474775,145.03010268799338:-37.798883995180496,145.03040309540322:-37.807106781970354,145.02895470253526:-37.80320743019992,145.01021142594075:-37.7999012967757,144.99318476311566:?routeType=shortest&key=SECRETKEY&computeTravelTimeFor=all")
routes.append(testRoute1)
#routes.append(testRoute2)
print(routes[0].u)
和XML的东西。
<summary>
<lengthInMeters>5144</lengthInMeters>
<travelTimeInSeconds>764</travelTimeInSeconds>
<trafficDelayInSeconds>0</trafficDelayInSeconds>
<departureTime>2017-12-28T14:42:14+11:00</departureTime>
<arrivalTime>2017-12-28T14:54:58+11:00</arrivalTime>
<noTrafficTravelTimeInSeconds>478</noTrafficTravelTimeInSeconds>
<historicTrafficTravelTimeInSeconds>764</historicTrafficTravelTimeInSeconds>
<liveTrafficIncidentsTravelTimeInSeconds>764</liveTrafficIncidentsTravelTimeInSeconds>
</summary>
<leg>
<summary>
<lengthInMeters>806</lengthInMeters>
<travelTimeInSeconds>67</travelTimeInSeconds>
<trafficDelayInSeconds>0</trafficDelayInSeconds>
<departureTime>2017-12-28T14:42:14+11:00</departureTime>
<arrivalTime>2017-12-28T14:43:21+11:00</arrivalTime>
<noTrafficTravelTimeInSeconds>59</noTrafficTravelTimeInSeconds>
<historicTrafficTravelTimeInSeconds>67</historicTrafficTravelTimeInSeconds>
<liveTrafficIncidentsTravelTimeInSeconds>67</liveTrafficIncidentsTravelTimeInSeconds>
</summary>
答案 0 :(得分:1)
我推荐lxml。在我看来,浏览xml树比使用Element Tree更容易。 。以下是demo如何使用该模块。
示例
拿你的xml,这就是我用lxml解析它的方法。如果保存example.xml和xmlparse.py
example.xml - 您提供的XML格式错误。
<leg>
标记。这两个问题不允许解析,因此我删除了<leg>
标记,并将<parent>
标记中的两个摘要部分分组。这是XML。
<parent>
<summary>
<lengthInMeters>5144</lengthInMeters>
<travelTimeInSeconds>764</travelTimeInSeconds>
<trafficDelayInSeconds>0</trafficDelayInSeconds>
<departureTime>2017-12-28T14:42:14+11:00</departureTime>
<arrivalTime>2017-12-28T14:54:58+11:00</arrivalTime>
<noTrafficTravelTimeInSeconds>478</noTrafficTravelTimeInSeconds>
<historicTrafficTravelTimeInSeconds>764</historicTrafficTravelTimeInSeconds>
<liveTrafficIncidentsTravelTimeInSeconds>764</liveTrafficIncidentsTravelTimeInSeconds>
</summary>
<summary>
<lengthInMeters>806</lengthInMeters>
<travelTimeInSeconds>67</travelTimeInSeconds>
<trafficDelayInSeconds>0</trafficDelayInSeconds>
<departureTime>2017-12-28T14:42:14+11:00</departureTime>
<arrivalTime>2017-12-28T14:43:21+11:00</arrivalTime>
<noTrafficTravelTimeInSeconds>59</noTrafficTravelTimeInSeconds>
<historicTrafficTravelTimeInSeconds>67</historicTrafficTravelTimeInSeconds>
<liveTrafficIncidentsTravelTimeInSeconds>67</liveTrafficIncidentsTravelTimeInSeconds>
</summary>
</parent>
xmlparse.py - 在这个脚本中,我为你提供了一个打印出键(elem.text)和值(文本)的循环,以及一个检查其中一个键的逻辑语句存在,如果其值大于700,这只是为了帮助您了解如何在循环中添加触发器。
from lxml import etree
def parseXML(xmlFile):
"""
Parse the xml
"""
with open(xmlFile) as fobj:
xml = fobj.read()
root = etree.fromstring(xml)
for appt in root.getchildren():
for elem in appt.getchildren():
if not elem.text:
text = "None"
else:
text = elem.text
##This is doing something with the xml based on it's tag and value.
if elem.tag == 'travelTimeInSeconds' and int(text) > 700:
print('******** Do something with ', elem.tag, ' : ', text)
print(elem.tag + " => " + text)
if __name__ == "__main__":
parseXML("example.xml")
输出 - 如果保存xmlparse.py的代码并保存我在example.xml文件中提供的更新的xml,则在运行脚本时会收到以下输出:
lengthInMeters => 5144
******** Do something with travelTimeInSeconds : 764
travelTimeInSeconds => 764
trafficDelayInSeconds => 0
departureTime => 2017-12-28T14:42:14+11:00
arrivalTime => 2017-12-28T14:54:58+11:00
noTrafficTravelTimeInSeconds => 478
historicTrafficTravelTimeInSeconds => 764
liveTrafficIncidentsTravelTimeInSeconds => 764
lengthInMeters => 806
travelTimeInSeconds => 67
trafficDelayInSeconds => 0
departureTime => 2017-12-28T14:42:14+11:00
arrivalTime => 2017-12-28T14:43:21+11:00
noTrafficTravelTimeInSeconds => 59
historicTrafficTravelTimeInSeconds => 67
liveTrafficIncidentsTravelTimeInSeconds => 67