使用Python解析嵌套的JSON:TypeError:列表索引必须是整数,而不是str

时间:2019-07-11 02:22:42

标签: python json pandas

我有嵌套的数据,我想将其从JSON插入到Pandas数据框中,但是我的JSON是嵌套的并给出错误

下面是数据

{"data":[{"date":"2018-08-20T00:00:00","values":[{"account":"account_1","device":"device_1","deviceModel":"testdev","id":"id_1","Events":[{"EventCategory":"Scan","EventCategoryData":[{"name":"scanname","info":[{"type":"any","count":8.0}]},{"name":"scanname","info":[{"type":"any","count":1.0}]}],"scancount":2.0},{"EventCategory":"Web","EventCategoryData":[{"name":"web_Scan","info":[{"type":"Web","count":2.0}]},{"name":"web scan 2","info":[{"type":"Web 2","count":0.0}]},{"name":"web 3 ","info":[{"type":"Web 3","count":2.0}]}]},{"EventCategory":"WWW","EventCategoryData":[{"name":"any","info":[{"type":"wifi","count":2.0}]}],"scancount":4.0},{"EventCategory":"Others","EventCategoryData":[{"name":"anything","info":[{"previousversion":"default","updatedversion":"default"}]}]}]}]},{"date":"2018-08-22T00:00:00","values":[{"account":"account_1","device":"device_1","deviceModel":"testdev","id":"id_2","Events":[{"EventCategory":"Scan2","EventCategoryData":[{"name":"scan name","info":[{"type":"scan 2","count":2}]},{"name":"update","info":[{"type":"scan","count":1},{"type":"WWW","count":1}]}],"scancount":1},{"EventCategory":"Web","EventCategoryData":[{"name":"web1","info":[{"type":"WWW","count":1}]},{"name":"Wifi","info":[{"type":"Web Sites","count":1}]},{"name":"web2","info":[{"type":"scan","count":1}]}]}]}]}],"status":"success"}

我尝试了json_normalize

normalize_data = json_normalize(data['data'],['values'], record_path ='EventCategory' ,errors='ignore')

TypeError: json_normalize() got multiple values for argument 'record_path'

我想用所有键作为列,值作为行建立一个数据框。任何帮助都在这里

1 个答案:

答案 0 :(得分:0)

json_normalize()-无法使用json_normalize()以完全通用的方式执行此操作。您可以使用record_pathmeta参数来指示如何处理JSON。

from pandas.io.json import json_normalize

data ={"data":[{"date":"2018-08-20T00:00:00","values":[{"account":"account_1","device":"device_1","deviceModel":"testdev","id":"id_1","Events":[{"EventCategory":"Scan","EventCategoryData":[{"name":"scanname","info":[{"type":"any","count":8.0}]},{"name":"scanname","info":[{"type":"any","count":1.0}]}],"scancount":2.0},{"EventCategory":"Web","EventCategoryData":[{"name":"web_Scan","info":[{"type":"Web","count":2.0}]},{"name":"web scan 2","info":[{"type":"Web 2","count":0.0}]},{"name":"web 3 ","info":[{"type":"Web 3","count":2.0}]}]},{"EventCategory":"WWW","EventCategoryData":[{"name":"any","info":[{"type":"wifi","count":2.0}]}],"scancount":4.0},{"EventCategory":"Others","EventCategoryData":[{"name":"anything","info":[{"previousversion":"default","updatedversion":"default"}]}]}]}]},{"date":"2018-08-22T00:00:00","values":[{"account":"account_1","device":"device_1","deviceModel":"testdev","id":"id_2","Events":[{"EventCategory":"Scan2","EventCategoryData":[{"name":"scan name","info":[{"type":"scan 2","count":2}]},{"name":"update","info":[{"type":"scan","count":1},{"type":"WWW","count":1}]}],"scancount":1},{"EventCategory":"Web","EventCategoryData":[{"name":"web1","info":[{"type":"WWW","count":1}]},{"name":"Wifi","info":[{"type":"Web Sites","count":1}]},{"name":"web2","info":[{"type":"scan","count":1}]}]}]}]}],"status":"success"}

#merge all data['data] multiple list of data['value'] into single list
flat_list = [item for sublist in data['data'] for item in sublist['values']]
result = json_normalize(flat_list, record_path=['Events','EventCategoryData','info'],\
                        meta=['account','device','deviceModel','id',['Events','EventCategory'],\
                              ['Events','EventCategory','name']])
print(result)

O / P:

    count previousversion       type updatedversion    account    device deviceModel    id Events.EventCategory Events.EventCategory.name
0     8.0             NaN        any            NaN  account_1  device_1     testdev  id_1                 Scan                  scanname
1     1.0             NaN        any            NaN  account_1  device_1     testdev  id_1                 Scan                  scanname
2     2.0             NaN        Web            NaN  account_1  device_1     testdev  id_1                  Web                  web_Scan
3     0.0             NaN      Web 2            NaN  account_1  device_1     testdev  id_1                  Web                web scan 2
4     2.0             NaN      Web 3            NaN  account_1  device_1     testdev  id_1                  Web                    web 3 
5     2.0             NaN       wifi            NaN  account_1  device_1     testdev  id_1                  WWW                       any
6     NaN         default        NaN        default  account_1  device_1     testdev  id_1               Others                  anything
7     2.0             NaN     scan 2            NaN  account_1  device_1     testdev  id_2                Scan2                 scan name
8     1.0             NaN       scan            NaN  account_1  device_1     testdev  id_2                Scan2                    update
9     1.0             NaN        WWW            NaN  account_1  device_1     testdev  id_2                Scan2                    update
10    1.0             NaN        WWW            NaN  account_1  device_1     testdev  id_2                  Web                      web1
11    1.0             NaN  Web Sites            NaN  account_1  device_1     testdev  id_2                  Web                      Wifi
12    1.0             NaN       scan            NaN  account_1  device_1     testdev  id_2                  Web                      web2

更新

#merge all data['data] multiple list into single list and merge date items into values sublist of dict.
flat_list = []
for sublist in data['data']:
    new_list = [item for item in sublist['values']]
    new_list[0]['date'] = sublist['date']
    flat_list.extend(new_list)

result = json_normalize(flat_list, record_path=['Events','EventCategoryData','info'],\
                        meta=['account','device','deviceModel','id','date',['Events','EventCategory'],\
                              ['Events','EventCategory','name']])

print(result)

O / P:

    count previousversion       type updatedversion  ...    id                 date Events.EventCategory Events.EventCategory.name
0     8.0             NaN        any            NaN  ...  id_1  2018-08-20T00:00:00                 Scan                  scanname
1     1.0             NaN        any            NaN  ...  id_1  2018-08-20T00:00:00                 Scan                  scanname
2     2.0             NaN        Web            NaN  ...  id_1  2018-08-20T00:00:00                  Web                  web_Scan
3     0.0             NaN      Web 2            NaN  ...  id_1  2018-08-20T00:00:00                  Web                web scan 2
4     2.0             NaN      Web 3            NaN  ...  id_1  2018-08-20T00:00:00                  Web                    web 3 
5     2.0             NaN       wifi            NaN  ...  id_1  2018-08-20T00:00:00                  WWW                       any
6     NaN         default        NaN        default  ...  id_1  2018-08-20T00:00:00               Others                  anything
7     2.0             NaN     scan 2            NaN  ...  id_2  2018-08-22T00:00:00                Scan2                 scan name
8     1.0             NaN       scan            NaN  ...  id_2  2018-08-22T00:00:00                Scan2                    update
9     1.0             NaN        WWW            NaN  ...  id_2  2018-08-22T00:00:00                Scan2                    update
10    1.0             NaN        WWW            NaN  ...  id_2  2018-08-22T00:00:00                  Web                      web1
11    1.0             NaN  Web Sites            NaN  ...  id_2  2018-08-22T00:00:00                  Web                      Wifi
12    1.0             NaN       scan            NaN  ...  id_2  2018-08-22T00:00:00                  Web                      web2

[13 rows x 11 columns]