GroupBY频率计数JSON响应 - 嵌套字段

时间:2018-01-06 11:54:35

标签: python json python-3.x pandas pandas-groupby

我尝试聚合来自API调用的响应,该API调用返回JSON对象并获得一些频率计数。

我已经设法为JSON响应中的一个字段执行此操作,但我想尝试相同的第二个字段是不工作

这两个字段都被称为"类别"但是没有工作的那个嵌套在" outcome_status"。

我得到的错误是KeyError:' category'

以下代码使用不需要身份验证的公共API,因此可以轻松测试。

import simplejson
import requests

#make a polygon for use in the API call
lat_coord = 51.767538
long_coord = -1.497488
lat_upper = str(lat_coord + 0.02)
lat_lower = str(lat_coord - 0.02)
long_upper = str(long_coord + 0.02)
long_lower = str(long_coord - 0.02)

#call from the API - no authentication required
api_call="https://data.police.uk/api/crimes-street/all-crime?poly=" + lat_lower + "," + long_upper + ":" + lat_lower  + "," + long_lower + ":" + lat_upper + "," + long_lower + ":"  + lat_upper  + "," + long_upper + "&date=2017-01"
print (api_call)

request_resp=requests.get(api_call).json()

import pandas as pd
import numpy as np

df_resp = pd.DataFrame(request_resp)

#frequency counts for non-nested field (this works) 
df_resp.groupby('category').context.count()

#next bit tries to do the nested (this doesn't work)

#tried dropping nulls
df_outcome = df_resp['outcome_status'].dropna()
print(df_outcome)

#tried index reset
df_outcome.reset_index()

#just errors
df_outcome.groupby('category').date.count()

1 个答案:

答案 0 :(得分:1)

如果您在"outcome_status"列中扩展dict,我认为您将拥有最简单的时间:

代码:

outcome_status = [
    {'outcome_status_' + k: v for k, v in z.items()} for z in (
        dict(category=None, date=None) if x is None else x
        for x in (y['outcome_status'] for y in request_resp)
    )
]
df = pd.concat([df_resp.drop('outcome_status', axis=1),
                pd.DataFrame(outcome_status)], axis=1)

这会使用一些理解来将outcome_status中的字段重新命名为预先挂起的"outcome_status_"到关键名称并将其转换为列。它还会扩展None值。

测试代码:

import requests
import pandas as pd

# make a polygon for use in the API call
lat_coord = 51.767538
long_coord = -1.497488
lat_upper = str(lat_coord + 0.02)
lat_lower = str(lat_coord - 0.02)
long_upper = str(long_coord + 0.02)
long_lower = str(long_coord - 0.02)

# call from the API - no authentication required
api_call = ("https://data.police.uk/api/crimes-street/all-crime?poly=" +
            lat_lower + "," + long_upper + ":" +
            lat_lower + "," + long_lower + ":" +
            lat_upper + "," + long_lower + ":" +
            lat_upper + "," + long_upper + "&date=2017-01")

request_resp = requests.get(api_call).json()
df_resp = pd.DataFrame(request_resp)

outcome_status = [
    {'outcome_status_' + k: v for k, v in z.items()} for z in (
        dict(category=None, date=None) if x is None else x
        for x in (y['outcome_status'] for y in request_resp)
    )
]
df = pd.concat([df_resp.drop('outcome_status', axis=1),
                pd.DataFrame(outcome_status)], axis=1)

# just errors
print(df.groupby('outcome_status_category').category.count())

结果:

outcome_status_category
Court result unavailable                          4
Investigation complete; no suspect identified    38
Local resolution                                  1
Offender given a caution                          2
Offender given community sentence                 3
Offender given conditional discharge              1
Offender given penalty notice                     2
Status update unavailable                         6
Suspect charged as part of another case           1
Unable to prosecute suspect                       9
Name: category, dtype: int64