我尝试读取有效的Openstreetmaps API输出JSON字符串。
我正在使用以下代码:
import pandas as pd
import requests
# Links unten
minLat = 50.9549
minLon = 13.55232
# Rechts oben
maxLat = 51.1390
maxLon = 13.89873
osmrequest = {'data': '[out:json][timeout:25];(node["highway"="bus_stop"](%s,%s,%s,%s););out body;>;out skel qt;' % (minLat, minLon, maxLat, maxLon)}
osmurl = 'http://overpass-api.de/api/interpreter'
osm = requests.get(osmurl, params=osmrequest)
osmdata = osm.json()
osmdataframe = pd.read_json(osmdata)
会抛出以下错误:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-66-304b7fbfb645> in <module>()
----> 1 osmdataframe = pd.read_json(osmdata)
/Users/paul/anaconda/lib/python2.7/site-packages/pandas/io/json.pyc in read_json(path_or_buf, orient, typ, dtype, convert_axes, convert_dates, keep_default_dates, numpy, precise_float, date_unit)
196 obj = FrameParser(json, orient, dtype, convert_axes, convert_dates,
197 keep_default_dates, numpy, precise_float,
--> 198 date_unit).parse()
199
200 if typ == 'series' or obj is None:
/Users/paul/anaconda/lib/python2.7/site-packages/pandas/io/json.pyc in parse(self)
264
265 else:
--> 266 self._parse_no_numpy()
267
268 if self.obj is None:
/Users/paul/anaconda/lib/python2.7/site-packages/pandas/io/json.pyc in _parse_no_numpy(self)
481 if orient == "columns":
482 self.obj = DataFrame(
--> 483 loads(json, precise_float=self.precise_float), dtype=None)
484 elif orient == "split":
485 decoded = dict((str(k), v)
TypeError: Expected String or Unicode
如何修改请求或Pandas read_json
以避免错误?顺便问一下,问题是什么?
答案 0 :(得分:13)
如果将json字符串打印到文件,
content = osm.read()
with open('/tmp/out', 'w') as f:
f.write(content)
你会看到类似的东西:
{
"version": 0.6,
"generator": "Overpass API",
"osm3s": {
"timestamp_osm_base": "2014-07-20T07:52:02Z",
"copyright": "The data included in this document is from www.openstreetmap.org. The data is made available under ODbL."
},
"elements": [
{
"type": "node",
"id": 536694,
"lat": 50.9849256,
"lon": 13.6821776,
"tags": {
"highway": "bus_stop",
"name": "Niederhäslich Bergmannsweg"
}
},
...]}
如果要将JSON字符串转换为Python对象,那么它将是一个字典,其elements
键是一个字典列表。绝大多数数据都在这个词典列表中。
此JSON字符串无法直接转换为Pandas对象。什么是索引,列是什么?
当然,您不希望[u'elements', u'version', u'osm3s', u'generator']
成为列,因为几乎所有信息都在elements
列表中。
但是,如果您希望DataFrame只包含elements
列表中的数据,那么您必须指定,因为Pandas无法做出这样的假设你。
更复杂的是elements
中的每个字典都是嵌套字典。考虑elements
中的第一个词典:
{
"type": "node",
"id": 536694,
"lat": 50.9849256,
"lon": 13.6821776,
"tags": {
"highway": "bus_stop",
"name": "Niederhäslich Bergmannsweg"
}
}
['lat', 'lon', 'type', 'id', 'tags']
应该是列吗?
这似乎是合理的,除了tags
列最终会成为一列dicts。这通常不是很有用。如果将tags
dict中的键放入列中,那么它可能会更好。我们可以做到这一点,但我们必须自己编码,因为熊猫无法知道我们想要的东西。
import pandas as pd
import requests
# Links unten
minLat = 50.9549
minLon = 13.55232
# Rechts oben
maxLat = 51.1390
maxLon = 13.89873
osmrequest = {'data': '[out:json][timeout:25];(node["highway"="bus_stop"](%s,%s,%s,%s););out body;>;out skel qt;' % (minLat, minLon, maxLat, maxLon)}
osmurl = 'http://overpass-api.de/api/interpreter'
osm = requests.get(osmurl, params=osmrequest)
osmdata = osm.json()
osmdata = osmdata['elements']
for dct in osmdata:
for key, val in dct['tags'].iteritems():
dct[key] = val
del dct['tags']
osmdataframe = pd.DataFrame(osmdata)
print(osmdataframe[['lat', 'lon', 'name']].head())
产量
lat lon name
0 50.984926 13.682178 Niederhäslich Bergmannsweg
1 51.123623 13.782789 Sagarder Weg
2 51.065752 13.895734 Weißig, Einkaufszentrum
3 51.007140 13.698498 Stuttgarter Straße
4 51.010199 13.701411 Heilbronner Straße