我尝试聚合来自API调用的响应,该API调用返回JSON对象并获得一些频率计数。
我已经设法为JSON响应中的一个字段执行此操作,但我想尝试相同的第二个字段是不工作
这两个字段都被称为"类别"但是没有工作的那个嵌套在" outcome_status"。
我得到的错误是KeyError:' category'
以下代码使用不需要身份验证的公共API,因此可以轻松测试。
import simplejson
import requests
#make a polygon for use in the API call
lat_coord = 51.767538
long_coord = -1.497488
lat_upper = str(lat_coord + 0.02)
lat_lower = str(lat_coord - 0.02)
long_upper = str(long_coord + 0.02)
long_lower = str(long_coord - 0.02)
#call from the API - no authentication required
api_call="https://data.police.uk/api/crimes-street/all-crime?poly=" + lat_lower + "," + long_upper + ":" + lat_lower + "," + long_lower + ":" + lat_upper + "," + long_lower + ":" + lat_upper + "," + long_upper + "&date=2017-01"
print (api_call)
request_resp=requests.get(api_call).json()
import pandas as pd
import numpy as np
df_resp = pd.DataFrame(request_resp)
#frequency counts for non-nested field (this works)
df_resp.groupby('category').context.count()
#next bit tries to do the nested (this doesn't work)
#tried dropping nulls
df_outcome = df_resp['outcome_status'].dropna()
print(df_outcome)
#tried index reset
df_outcome.reset_index()
#just errors
df_outcome.groupby('category').date.count()
答案 0 :(得分:1)
如果您在"outcome_status"
列中扩展dict,我认为您将拥有最简单的时间:
outcome_status = [
{'outcome_status_' + k: v for k, v in z.items()} for z in (
dict(category=None, date=None) if x is None else x
for x in (y['outcome_status'] for y in request_resp)
)
]
df = pd.concat([df_resp.drop('outcome_status', axis=1),
pd.DataFrame(outcome_status)], axis=1)
这会使用一些理解来将outcome_status
中的字段重新命名为预先挂起的"outcome_status_"
到关键名称并将其转换为列。它还会扩展None
值。
import requests
import pandas as pd
# make a polygon for use in the API call
lat_coord = 51.767538
long_coord = -1.497488
lat_upper = str(lat_coord + 0.02)
lat_lower = str(lat_coord - 0.02)
long_upper = str(long_coord + 0.02)
long_lower = str(long_coord - 0.02)
# call from the API - no authentication required
api_call = ("https://data.police.uk/api/crimes-street/all-crime?poly=" +
lat_lower + "," + long_upper + ":" +
lat_lower + "," + long_lower + ":" +
lat_upper + "," + long_lower + ":" +
lat_upper + "," + long_upper + "&date=2017-01")
request_resp = requests.get(api_call).json()
df_resp = pd.DataFrame(request_resp)
outcome_status = [
{'outcome_status_' + k: v for k, v in z.items()} for z in (
dict(category=None, date=None) if x is None else x
for x in (y['outcome_status'] for y in request_resp)
)
]
df = pd.concat([df_resp.drop('outcome_status', axis=1),
pd.DataFrame(outcome_status)], axis=1)
# just errors
print(df.groupby('outcome_status_category').category.count())
outcome_status_category
Court result unavailable 4
Investigation complete; no suspect identified 38
Local resolution 1
Offender given a caution 2
Offender given community sentence 3
Offender given conditional discharge 1
Offender given penalty notice 2
Status update unavailable 6
Suspect charged as part of another case 1
Unable to prosecute suspect 9
Name: category, dtype: int64