json_normalize的行为与深度嵌套的json数据不一致

时间:2019-05-05 14:47:41

标签: python json pandas

我正在尝试使用json_normalize将深度嵌套的json导入pandas(v0.24.2),并遇到一些我要解决的不一致问题。 一个示例json如下,其格式不变,如Missing keyEB

所示
json =  [   {'keyA': 1,
             'keyB': 2,
             'keyC': [{
                     'keyCA': 3,
                     'keyCB': {'keyCBA':4,
                               'keyCBB':5,
                               'keyCBC': [{'keyCBCA':6, 'keyCBCB':7, 'keyCBCC':8},
                                          {'keyCBCA':9, 'keyCBCB':10, 'keyCBCC':11},
                                          {'keyCBCA':12, 'keyCBCB':13, 'keyCBCC':14}],
                               'keyCBD':15},
                     'keyCC':16}],
            'keyD':17,
            'keyE': [{
                     'keyEA':18,
                     'keyEB': {'keyEBA':19,'keyEBB':20}
                     }]
            },{
            'keyA': 31,
            'keyB': 32,
            'keyC': [{
                    'keyCA': 33,
                    'keyCB': {'keyCBA': 34,
                              'keyCBB': 35,
                              'keyCBC': [{'keyCBCA': 36, 'keyCBCB': 37, 'keyCBCC': 38},
                                         {'keyCBCA': 39, 'keyCBCB': 40, 'keyCBCC': 41},
                                         {'keyCBCA': 42, 'keyCBCB': 43, 'keyCBCC': 44}],
                              'keyCBD': 45},
                    'keyCC': 46}],
            'keyD': 47,
            'keyE': [{
                    'keyEA': 48,
                    'Missing keyEB': 49
                    }]
            }]

以下代码给出了json_normalize的预期行为,提取了正确归一化的数据:

第一级json已正确规范化

from pandas.io.json import json_normalize
json_normalize(data = json)

   keyA  keyB       keyC  keyD       keyE
0     1     2  [{'key...    17  [{'key...
1    31    32  [{'key...    47  [{'key...

第二级KeyC已正确规范化

json_normalize(data = json, record_path = ['keyC'], meta = ['keyA']) 

   keyCA      keyCB  keyCC  keyA
0      3  {'keyC...     16     1
1     33  {'keyC...     46    31

第四级keyCBC已正确规范化

json_normalize(data = json, record_path = ['keyC', 'keyCB', 'keyCBC'], meta = ['keyA'])

   keyCBCA  keyCBCB  keyCBCC  keyA
0        6        7        8     1
1        9       10       11     1
2       12       13       14     1
3       36       37       38    31
4       39       40       41    31
5       42       43       44    31

但是,其他分支似乎不一致地标准化。

第三级keyCB ......

json_normalize(data = json, record_path = ['keyC', 'keyCB'], meta = ['keyA'])

        0  keyA
0  keyCBA     1
1  keyCBB     1
2  keyCBC     1
3  keyCBD     1
4  keyCBA    31
5  keyCBB    31
6  keyCBC    31
7  keyCBD    31

#Uhhhh ! I was expecting
#    keyCBA   keyCBB   keyCBC   keyCBD   KeyA
# 0       4        5   [{'key..     15      1
# 1      34       35   [{'key..     45     31

以及以下由于missing keyEB

而引起的关键字错误完全炸弹
json_normalize(data = json, record_path = ['keyE', 'keyEB'], meta = ['keyA'])

Traceback (most recent call last):......
KeyError: 'keyEB'

#I was expecting
#      keyEBA   keyEBB   keyA
# 0        19       20      1
# 1       NaN      NaN     31

是否有任何简便的方法可以使jsons_normalize获得一致的行为?

0 个答案:

没有答案