嗨,我有以下熊猫数据框:
df = pd.DataFrame({'info':[1.4,3.6,6.5], 'new':[[{'score':0.998, 'letters':'C', 'temp':1}, {'score':1.343, 'letters':'B', 'temp':0}, {'score':2.323, 'letters':'F', 'temp':1}], [{'score':2.532, 'letters':'D', 'temp':1}, {'score':2.123, 'letters':'G', 'temp':1}, {'score':4.332, 'letters':'S', 'temp':0}], [{'score':2.223, 'letters':'C', 'temp':0}, {'score':1.144, 'letters':'J', 'temp':1}, {'score':7.443, 'letters':'G', 'temp':9}]]})
df:
info new
0 1.4 [{'score': 0.998, 'letters': 'C', 'temp': 1}, ...
1 3.6 [{'score': 2.532, 'letters': 'D', 'temp': 1}, ...
2 6.5 [{'score': 2.223, 'letters': 'C', 'temp': 0}, ...
我正在尝试从每个字典中获取分数值和字母值并获得以下输出:
info score1 score2 score3 letters1 letters2 letters3
0 1.4 0.998 1.343 2.323 C B F
1 3.6 2.532 2.123 4.332 D G S
2 6.5 2.223 1.144 7.443 C J G
我已经试过了:
from pandas.io.json import json_normalize
records = [x[0] for x in df['new']]
res = pd.concat([df.drop('new', axis=1), json_normalize(records)], axis=1)
我也试过:
df = df['new'].apply(lambda x: pd.Series({i['score']: i['letters'] for i in x if isinstance(x, list)}))
任何帮助都会很棒
答案 0 :(得分:2)
对带有 enumerate
的新列名称使用带有 dict 理解的列表:
d = [{f'{k}{i}': v for i,y in enumerate(x, 1) for k,v in y.items()} for x in df['new']]
df = pd.DataFrame(d, index=df.index).sort_index(axis=1)
print (df)
letters1 letters2 letters3 score1 score2 score3 temp1 temp2 temp3
0 C B F 0.998 1.343 2.323 1 0 1
1 D G S 2.532 2.123 4.332 1 1 0
2 C J G 2.223 1.144 7.443 0 1 9
d = [{f'{k}{i}': v for i,y in enumerate(x, 1) for k,v in y.items() if k in ['score','letters']} for x in df.pop('new')]
df = df.join(pd.DataFrame(d, index=df.index).sort_index(axis=1))
print (df)
info letters1 letters2 letters3 score1 score2 score3
0 1.4 C B F 0.998 1.343 2.323
1 3.6 D G S 2.532 2.123 4.332
2 6.5 C J G 2.223 1.144 7.443
答案 1 :(得分:0)
试试这个
df = df.explode('new')
#pd.json_normalize(df['new'])
df2 = pd.concat([df['info'].reset_index(drop = True), pd.json_normalize(df['new'])], axis = 1)
explode 会将其从列表中取出,然后对字典进行规范化
答案 2 :(得分:0)
explode()
并展开列join()
恢复原状df2 = (
df.explode("new")["new"]
.apply(pd.Series)
.pipe(lambda d: d.assign(obs=d.groupby(level=0).cumcount() + 1))
.set_index("obs", append=True)
.unstack("obs")
)
df2.columns = ["".join(map(str, c)) for c in df2.columns.to_flat_index()]
df.loc[:, ["info"]].join(df2)
信息 | score1 | score2 | score3 | letters1 | letters2 | letters3 | temp1 | temp2 | temp3 | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 1.4 | 0.998 | 1.343 | 2.323 | C | B | F | 1 | 0 | 1 |
1 | 3.6 | 2.532 | 2.123 | 4.332 | D | G | S | 1 | 1 | 0 |
2 | 6.5 | 2.223 | 1.144 | 7.443 | C | J | G | 0 | 1 | 9 |