将dict的dict转换为数据框

时间:2019-05-09 23:43:15

标签: json pandas dataframe dictionary

我有一个稍微复杂的json,我需要将其转换为数据框。这是另一个API的标准输出json,因此字段名称不会更改。

我有一个下面的字典,它比我迄今为止使用的更为复杂

>>> import pandas as pd
>>> data = [{'annotation_spec': {'description': 'Story_Driven',
...    'display_name': 'Story_Driven'},
...   'segments': [{'confidence': 0.52302074,
...     'segment': {'end_time_offset': {'nanos': 973306000, 'seconds': 14},
...      'start_time_offset': {}}}]},
...  {'annotation_spec': {'description': 'real', 'display_name': 'real'},
...   'segments': [{'confidence': 0.5244379,
...     'segment': {'end_time_offset': {'nanos': 973306000, 'seconds': 14},
...      'start_time_offset': {}}}]}]

我浏览了所有相关的SO帖子,最接近的就是这个

from pandas.io.json import json_normalize
pd.DataFrame.from_dict(json_normalize(data,record_path= 
['segments'],meta=[['annotation_spec','description'], 
['annotation_spec','display_name']],errors='ignore'))

这给了我这样的输出

>>> from pandas.io.json import json_normalize
>>> pd.DataFrame.from_dict(json_normalize(data,record_path=['segments'],meta=[['annotation_spec','description'],['annotation_spec','display_name']],errors='ignore'))
   confidence                                            segment annotation_spec.description annotation_spec.display_name
0    0.523021  {u'end_time_offset': {u'nanos': 973306000, u's...                Story_Driven                 Story_Driven
1    0.524438  {u'end_time_offset': {u'nanos': 973306000, u's...                        real                         real
>>>

我想将上面的“细分”列细分为各个组成部分。我该怎么办?

1 个答案:

答案 0 :(得分:1)

基本上json_normalize会处理嵌套的字典,由于segements键中的列表,我们这里有一个问题。

因此,如果列表的长度始终为1,我们可以删除列表,然后应用json_normalize

### function to remove the lsit, we basically check if its a list, if so just take the first element
remove_list = lambda dct:{k:(v[0] if type(v)==list else v) for k,v in dct.items()}

data_clean = [remove_list(entry) for entry in data]

json_normalize(data_clean, sep="__")

enter image description here