我正在尝试使用json_normalize
将深度嵌套的json导入pandas(v0.24.2),并遇到一些我要解决的不一致问题。
一个示例json如下,其格式不变,如Missing keyEB
json = [ {'keyA': 1,
'keyB': 2,
'keyC': [{
'keyCA': 3,
'keyCB': {'keyCBA':4,
'keyCBB':5,
'keyCBC': [{'keyCBCA':6, 'keyCBCB':7, 'keyCBCC':8},
{'keyCBCA':9, 'keyCBCB':10, 'keyCBCC':11},
{'keyCBCA':12, 'keyCBCB':13, 'keyCBCC':14}],
'keyCBD':15},
'keyCC':16}],
'keyD':17,
'keyE': [{
'keyEA':18,
'keyEB': {'keyEBA':19,'keyEBB':20}
}]
},{
'keyA': 31,
'keyB': 32,
'keyC': [{
'keyCA': 33,
'keyCB': {'keyCBA': 34,
'keyCBB': 35,
'keyCBC': [{'keyCBCA': 36, 'keyCBCB': 37, 'keyCBCC': 38},
{'keyCBCA': 39, 'keyCBCB': 40, 'keyCBCC': 41},
{'keyCBCA': 42, 'keyCBCB': 43, 'keyCBCC': 44}],
'keyCBD': 45},
'keyCC': 46}],
'keyD': 47,
'keyE': [{
'keyEA': 48,
'Missing keyEB': 49
}]
}]
以下代码给出了json_normalize
的预期行为,提取了正确归一化的数据:
第一级json
已正确规范化
from pandas.io.json import json_normalize
json_normalize(data = json)
keyA keyB keyC keyD keyE
0 1 2 [{'key... 17 [{'key...
1 31 32 [{'key... 47 [{'key...
第二级KeyC
已正确规范化
json_normalize(data = json, record_path = ['keyC'], meta = ['keyA'])
keyCA keyCB keyCC keyA
0 3 {'keyC... 16 1
1 33 {'keyC... 46 31
第四级keyCBC
已正确规范化
json_normalize(data = json, record_path = ['keyC', 'keyCB', 'keyCBC'], meta = ['keyA'])
keyCBCA keyCBCB keyCBCC keyA
0 6 7 8 1
1 9 10 11 1
2 12 13 14 1
3 36 37 38 31
4 39 40 41 31
5 42 43 44 31
但是,其他分支似乎不一致地标准化。
第三级keyCB
......
json_normalize(data = json, record_path = ['keyC', 'keyCB'], meta = ['keyA'])
0 keyA
0 keyCBA 1
1 keyCBB 1
2 keyCBC 1
3 keyCBD 1
4 keyCBA 31
5 keyCBB 31
6 keyCBC 31
7 keyCBD 31
#Uhhhh ! I was expecting
# keyCBA keyCBB keyCBC keyCBD KeyA
# 0 4 5 [{'key.. 15 1
# 1 34 35 [{'key.. 45 31
以及以下由于missing keyEB
json_normalize(data = json, record_path = ['keyE', 'keyEB'], meta = ['keyA'])
Traceback (most recent call last):......
KeyError: 'keyEB'
#I was expecting
# keyEBA keyEBB keyA
# 0 19 20 1
# 1 NaN NaN 31
是否有任何简便的方法可以使jsons_normalize
获得一致的行为?