我正在使用一个开源项目调用OpenTripPlanner,这是一个我计划在给定时间用于模拟从一个点到另一个点的许多行程的工具。到目前为止,我已经设法找到一个URL,其中包含有关行程的所有信息的XML文件。 XML是根据请求构建的,因此URL不是静态的。 URL看起来像这样:
(您需要运行OpenTripPlanner服务器才能打开它)
现在,我想阅读这些XML文件并使用python 3进行一些数据分析,但我找不到一种方法来读取文件。我曾尝试使用urllib.request在本地下载文件,但是我从中获取的文件奇怪地形成了。它看起来像这样
{" requestParameters的" {"日期":" 2017年/ 12/04""模式":" TRANSIT ,WALK"," fromPlace":" 48.40915,-71.04996"," toPlace":" 48.41428,-71.06996", "时间":" 8:00:00"}"计划" {"日期":1512392400000,"从&# 34;:{"名称":"产地"" LON": - 71.04996" LAT":48.40915"原稿及#34;:""" vertexType":"师范大学"}"至" {"名称" :"目的地"" LON": - 71.06996" LAT":48.41428"原稿":"" " vertexType":"师范大学"}"路线":[{"持续时间":1538," STARTTIME" :1512392809000,"结束时间":1512394347000," walkTime":934," transitTime":602," waitingTime":2&#34 ; walkDistance":1189.6595112715966," walkLimitExceeded":假," elevationLost":0.0" elevationGained":0.0"传输&#34 ;: 0,"腿":[{" STARTTIME":1512392809000 "结束时间":1512393537000," departureDelay":0," arrivalDelay":0,"实时":假,"距离& #34;:926.553,"通路":假,"模式":" WALK""路线":"&# 34;," agencyTimeZoneOffset": - 18000000" interlineWithPreviousLeg":假,"从" {"名称":"产地& #34;" LON": - 71.04996" LAT":48.40915"离开":1512392809000,"原稿":&#34 ;"," vertexType":" NORMAL"}," to":{" name":" Roitelets / Martinets"" stopId":" 1:370"" stopCode":" 370"" LON&#34 ; - 71.047688," LAT":48.401531,"到达":1512393537000,"离开":1512393538000," stopIndex":15,& #34; stopSequence":16," vertexType":" TRANSIT"}" legGeometry" {"点":&# 34; S {mfHb {SPL | ExBp @ SDL @ @@ V LB | @Ĵ@ FLĴ@ GbCk @ | A] VESA ^ KBA | C {@ pCeACS〜CuA` @ Q""长度":19}" rentedBike":假," TRAN sitLeg":假,"持续时间":728.0"步骤":[{"距离":131.991," relativeDirection":& #34; DEPART"," streetName":" Rue D.-V.-Morrier"," absoluteDirection":" SOUTH" " stayOn":假,"区域":假," bogusName":假," LON": - 71.04961760502248," LAT":48.4090671692228,"海拔":[]},{"距离":72.319," relativeDirection":" LEFT&#34 ;, " streetName":" Rue Lorenzo-Genest"," absoluteDirection":" EAST"," stayOn":false, "区域":假," bogusName":假," LON": - 71.0502299," LAT":48.4079519,"高度和#34;:[]}
当我尝试在浏览器中打开文件时,出现错误消息
XML Parsing Error: not well-formed
Location: http://localhost:63342/XML_reader/file.xml?_ijt=e1d6h53s4mh1ak94sqortejf9v
Line Number 1, Column 1: ...
我使用的脚本非常简单,看起来像这样
import urllib.request
testfile = urllib.request.URLopener()
file_name = 'http://localhost:8080/otp/routers/default/plan?fromPlace=48.40915,%20-71.04996&toPlace=48.41428,%20-71.06996&date=2017/12/04&time=8:00:00&mode=TRANSIT,WALK'
testfile.retrieve(file_name, "file.xml")
如何使输出的XML文件格式正确?除了urllib.request之外还有其他方法可以尝试吗?
非常感谢
答案 0 :(得分:1)
要将此文件作为JSON数据(而不是XML)导入,您需要JSON库
import urllib.request
import json
from pprint import pprint
testfile = urllib.request.URLopener()
file_name = 'http://localhost:8080/otp/routers/default/plan?fromPlace=48.40915,%20-71.04996&toPlace=48.41428,%20-71.06996&date=2017/12/04&time=8:00:00&mode=TRANSIT,WALK'
testfile.retrieve(file_name, "file.json")
data = json.load(open('file.json'))
pprint(data)
json.load
读取JSON数据并转换为Python对象(https://docs.python.org/2/library/json.html?highlight=json%20load#json.load)pprint
用于"漂亮的打印" JSON数据(https://docs.python.org/2/library/pprint.html)