我有嵌套的数据,我想将其从JSON插入到Pandas数据框中,但是我的JSON是嵌套的并给出错误
下面是数据
{"data":[{"date":"2018-08-20T00:00:00","values":[{"account":"account_1","device":"device_1","deviceModel":"testdev","id":"id_1","Events":[{"EventCategory":"Scan","EventCategoryData":[{"name":"scanname","info":[{"type":"any","count":8.0}]},{"name":"scanname","info":[{"type":"any","count":1.0}]}],"scancount":2.0},{"EventCategory":"Web","EventCategoryData":[{"name":"web_Scan","info":[{"type":"Web","count":2.0}]},{"name":"web scan 2","info":[{"type":"Web 2","count":0.0}]},{"name":"web 3 ","info":[{"type":"Web 3","count":2.0}]}]},{"EventCategory":"WWW","EventCategoryData":[{"name":"any","info":[{"type":"wifi","count":2.0}]}],"scancount":4.0},{"EventCategory":"Others","EventCategoryData":[{"name":"anything","info":[{"previousversion":"default","updatedversion":"default"}]}]}]}]},{"date":"2018-08-22T00:00:00","values":[{"account":"account_1","device":"device_1","deviceModel":"testdev","id":"id_2","Events":[{"EventCategory":"Scan2","EventCategoryData":[{"name":"scan name","info":[{"type":"scan 2","count":2}]},{"name":"update","info":[{"type":"scan","count":1},{"type":"WWW","count":1}]}],"scancount":1},{"EventCategory":"Web","EventCategoryData":[{"name":"web1","info":[{"type":"WWW","count":1}]},{"name":"Wifi","info":[{"type":"Web Sites","count":1}]},{"name":"web2","info":[{"type":"scan","count":1}]}]}]}]}],"status":"success"}
我尝试了json_normalize
normalize_data = json_normalize(data['data'],['values'], record_path ='EventCategory' ,errors='ignore')
TypeError: json_normalize() got multiple values for argument 'record_path'
我想用所有键作为列,值作为行建立一个数据框。任何帮助都在这里
答案 0 :(得分:0)
json_normalize()-无法使用json_normalize()
以完全通用的方式执行此操作。您可以使用record_path
和meta
参数来指示如何处理JSON。
from pandas.io.json import json_normalize
data ={"data":[{"date":"2018-08-20T00:00:00","values":[{"account":"account_1","device":"device_1","deviceModel":"testdev","id":"id_1","Events":[{"EventCategory":"Scan","EventCategoryData":[{"name":"scanname","info":[{"type":"any","count":8.0}]},{"name":"scanname","info":[{"type":"any","count":1.0}]}],"scancount":2.0},{"EventCategory":"Web","EventCategoryData":[{"name":"web_Scan","info":[{"type":"Web","count":2.0}]},{"name":"web scan 2","info":[{"type":"Web 2","count":0.0}]},{"name":"web 3 ","info":[{"type":"Web 3","count":2.0}]}]},{"EventCategory":"WWW","EventCategoryData":[{"name":"any","info":[{"type":"wifi","count":2.0}]}],"scancount":4.0},{"EventCategory":"Others","EventCategoryData":[{"name":"anything","info":[{"previousversion":"default","updatedversion":"default"}]}]}]}]},{"date":"2018-08-22T00:00:00","values":[{"account":"account_1","device":"device_1","deviceModel":"testdev","id":"id_2","Events":[{"EventCategory":"Scan2","EventCategoryData":[{"name":"scan name","info":[{"type":"scan 2","count":2}]},{"name":"update","info":[{"type":"scan","count":1},{"type":"WWW","count":1}]}],"scancount":1},{"EventCategory":"Web","EventCategoryData":[{"name":"web1","info":[{"type":"WWW","count":1}]},{"name":"Wifi","info":[{"type":"Web Sites","count":1}]},{"name":"web2","info":[{"type":"scan","count":1}]}]}]}]}],"status":"success"}
#merge all data['data] multiple list of data['value'] into single list
flat_list = [item for sublist in data['data'] for item in sublist['values']]
result = json_normalize(flat_list, record_path=['Events','EventCategoryData','info'],\
meta=['account','device','deviceModel','id',['Events','EventCategory'],\
['Events','EventCategory','name']])
print(result)
O / P:
count previousversion type updatedversion account device deviceModel id Events.EventCategory Events.EventCategory.name
0 8.0 NaN any NaN account_1 device_1 testdev id_1 Scan scanname
1 1.0 NaN any NaN account_1 device_1 testdev id_1 Scan scanname
2 2.0 NaN Web NaN account_1 device_1 testdev id_1 Web web_Scan
3 0.0 NaN Web 2 NaN account_1 device_1 testdev id_1 Web web scan 2
4 2.0 NaN Web 3 NaN account_1 device_1 testdev id_1 Web web 3
5 2.0 NaN wifi NaN account_1 device_1 testdev id_1 WWW any
6 NaN default NaN default account_1 device_1 testdev id_1 Others anything
7 2.0 NaN scan 2 NaN account_1 device_1 testdev id_2 Scan2 scan name
8 1.0 NaN scan NaN account_1 device_1 testdev id_2 Scan2 update
9 1.0 NaN WWW NaN account_1 device_1 testdev id_2 Scan2 update
10 1.0 NaN WWW NaN account_1 device_1 testdev id_2 Web web1
11 1.0 NaN Web Sites NaN account_1 device_1 testdev id_2 Web Wifi
12 1.0 NaN scan NaN account_1 device_1 testdev id_2 Web web2
更新:
#merge all data['data] multiple list into single list and merge date items into values sublist of dict.
flat_list = []
for sublist in data['data']:
new_list = [item for item in sublist['values']]
new_list[0]['date'] = sublist['date']
flat_list.extend(new_list)
result = json_normalize(flat_list, record_path=['Events','EventCategoryData','info'],\
meta=['account','device','deviceModel','id','date',['Events','EventCategory'],\
['Events','EventCategory','name']])
print(result)
O / P:
count previousversion type updatedversion ... id date Events.EventCategory Events.EventCategory.name
0 8.0 NaN any NaN ... id_1 2018-08-20T00:00:00 Scan scanname
1 1.0 NaN any NaN ... id_1 2018-08-20T00:00:00 Scan scanname
2 2.0 NaN Web NaN ... id_1 2018-08-20T00:00:00 Web web_Scan
3 0.0 NaN Web 2 NaN ... id_1 2018-08-20T00:00:00 Web web scan 2
4 2.0 NaN Web 3 NaN ... id_1 2018-08-20T00:00:00 Web web 3
5 2.0 NaN wifi NaN ... id_1 2018-08-20T00:00:00 WWW any
6 NaN default NaN default ... id_1 2018-08-20T00:00:00 Others anything
7 2.0 NaN scan 2 NaN ... id_2 2018-08-22T00:00:00 Scan2 scan name
8 1.0 NaN scan NaN ... id_2 2018-08-22T00:00:00 Scan2 update
9 1.0 NaN WWW NaN ... id_2 2018-08-22T00:00:00 Scan2 update
10 1.0 NaN WWW NaN ... id_2 2018-08-22T00:00:00 Web web1
11 1.0 NaN Web Sites NaN ... id_2 2018-08-22T00:00:00 Web Wifi
12 1.0 NaN scan NaN ... id_2 2018-08-22T00:00:00 Web web2
[13 rows x 11 columns]