我有一个稍微复杂的json,我需要将其转换为数据框。这是另一个API的标准输出json,因此字段名称不会更改。
我有一个下面的字典,它比我迄今为止使用的更为复杂
>>> import pandas as pd
>>> data = [{'annotation_spec': {'description': 'Story_Driven',
... 'display_name': 'Story_Driven'},
... 'segments': [{'confidence': 0.52302074,
... 'segment': {'end_time_offset': {'nanos': 973306000, 'seconds': 14},
... 'start_time_offset': {}}}]},
... {'annotation_spec': {'description': 'real', 'display_name': 'real'},
... 'segments': [{'confidence': 0.5244379,
... 'segment': {'end_time_offset': {'nanos': 973306000, 'seconds': 14},
... 'start_time_offset': {}}}]}]
我浏览了所有相关的SO帖子,最接近的就是这个
from pandas.io.json import json_normalize
pd.DataFrame.from_dict(json_normalize(data,record_path=
['segments'],meta=[['annotation_spec','description'],
['annotation_spec','display_name']],errors='ignore'))
这给了我这样的输出
>>> from pandas.io.json import json_normalize
>>> pd.DataFrame.from_dict(json_normalize(data,record_path=['segments'],meta=[['annotation_spec','description'],['annotation_spec','display_name']],errors='ignore'))
confidence segment annotation_spec.description annotation_spec.display_name
0 0.523021 {u'end_time_offset': {u'nanos': 973306000, u's... Story_Driven Story_Driven
1 0.524438 {u'end_time_offset': {u'nanos': 973306000, u's... real real
>>>
我想将上面的“细分”列细分为各个组成部分。我该怎么办?
答案 0 :(得分:1)
基本上json_normalize
会处理嵌套的字典,由于segements键中的列表,我们这里有一个问题。
因此,如果列表的长度始终为1,我们可以删除列表,然后应用json_normalize
### function to remove the lsit, we basically check if its a list, if so just take the first element
remove_list = lambda dct:{k:(v[0] if type(v)==list else v) for k,v in dct.items()}
data_clean = [remove_list(entry) for entry in data]
json_normalize(data_clean, sep="__")