我一直在尝试解析数据框中的嵌套字典。 我用dict制作了这个df,但无法弄清楚这个嵌套的那个。
df
First second third
0 1 2 {nested dict}
嵌套字典:
{'fourth': '4', 'fifth': '5', 'sixth': '6'}, {'fourth': '7', 'fifth': '8', 'sixth': '9'}
我期望的输出为:
First second fourth fifth sixth fourth fifth sixth
0 1 2 4 5 6 7 8 9
编辑: 原始词典
'archi': [{'fourth': '115',
'fifth': '-162',
'sixth': '112'},
{'fourth': '52',
'fifth': '42',
'sixth': ' 32'}]
答案 0 :(得分:1)
我无法在“第三”列中说出嵌套字典的格式,但这是我建议使用Python: Pandas dataframe from Series of dict作为起点的内容。这是可重复的字典和数据框:
nst_dict = {'archi': [{'fourth': '115', 'fifth': '-162', 'sixth': '112'},
{'fourth': '52', 'fifth': '42','sixth': ' 32'}]}
df = pd.DataFrame.from_dict({'First':[1,2], 'Second':[2,3],
'third': [nst_dict,nst_dict]})
然后您需要首先访问字典中的列表,然后访问列表中的项目:
df.thrd_1 = df.third.apply(lambda x: x['archi']) # convert to list
df.thrd_1a = df.thrd_1.apply(lambda x: x[0]) # access first item
df.thrd_1b = df.thrd_1.apply(lambda x: x[1]) # access second item
out = df.drop('third', axis=1).merge(
df.thrd_1a.apply(pd.Series).merge(df.thrd_1a.apply(pd.Series),
left_index=True, right_index=True),
left_index=True, right_index=True)
print(out)
First Second fourth_x fifth_x sixth_x fourth_y fifth_y sixth_y
0 1 2 115 -162 112 115 -162 112
1 2 3 115 -162 112 115 -162 112
我将尝试用collections.abc
进行清理并将其转换为函数,但这应该可以解决您的特定情况。
答案 1 :(得分:0)
“蛮力”方法
import pandas as pd
import numpy as np
my_dict = {'Zero': 0, 'First': 1, 'Second': 2,
'archi': [{'fourth': '115', 'fifth': '-162', 'sixth': '112'},
{'fourth': '52', 'fifth': '42', 'sixth': ' 32'}]}
data_row=[]
columns = []
for key in my_dict.keys():
try:
if len(my_dict[key]):
for item in my_dict[key]:
# iterate over nested dicts
for k, v in item.items():
columns.append(k)
data_row.append(v)
except TypeError:
data_row.append(my_dict[key])
columns.append(key)
print(columns)
print(data_row)
data = np.array(data_row).reshape(1,9)
df = pd.DataFrame(new_d, columns=columns)
print(df)
输出:
Zero First Second fourth fifth sixth fourth fifth sixth
0 0 1 2 115 -162 112 52 42 32
答案 2 :(得分:0)
我使用递归方法创建了一个函数来扁平化dict结构:
regex
然后创建数据框:
original_dict = {'Zero': 0, 'First': 1, 'Second': 2,
'archi': [{'fourth': '115', 'fifth': '-162', 'sixth': '112'},
{'fourth': '52', 'fifth': '42', 'sixth': ' 32'}]}
flattened_dict = {}
def flatten(obj, name = ''):
if isinstance(obj, dict):
for key, value in obj.items():
flatten(obj[key], key)
elif isinstance(obj, list):
for e in obj:
flatten(e)
else:
flattened_dict[name] = [obj]
flatten(original_dict)
具有以下输出: