包含字典列表的 Pandas 数据框列

时间:2021-07-13 10:08:48

标签: python pandas

嗨,我有以下熊猫数据框:

df = pd.DataFrame({'info':[1.4,3.6,6.5], 'new':[[{'score':0.998, 'letters':'C', 'temp':1}, {'score':1.343, 'letters':'B', 'temp':0}, {'score':2.323, 'letters':'F', 'temp':1}], [{'score':2.532, 'letters':'D', 'temp':1}, {'score':2.123, 'letters':'G', 'temp':1}, {'score':4.332, 'letters':'S', 'temp':0}], [{'score':2.223, 'letters':'C', 'temp':0}, {'score':1.144, 'letters':'J', 'temp':1}, {'score':7.443, 'letters':'G', 'temp':9}]]})

df:

    info    new
0   1.4 [{'score': 0.998, 'letters': 'C', 'temp': 1}, ...
1   3.6 [{'score': 2.532, 'letters': 'D', 'temp': 1}, ...
2   6.5 [{'score': 2.223, 'letters': 'C', 'temp': 0}, ...

我正在尝试从每个字典中获取分数值和字母值并获得以下输出:

  info    score1    score2    score3   letters1   letters2   letters3
0 1.4     0.998     1.343     2.323    C          B          F
1 3.6     2.532     2.123     4.332    D          G          S
2 6.5     2.223     1.144     7.443    C          J          G

我已经试过了:

from pandas.io.json import json_normalize
records = [x[0] for x in df['new']]

res = pd.concat([df.drop('new', axis=1), json_normalize(records)], axis=1)

我也试过:

df = df['new'].apply(lambda x: pd.Series({i['score']: i['letters'] for i in x if isinstance(x, list)}))

任何帮助都会很棒

3 个答案:

答案 0 :(得分:2)

对带有 enumerate 的新列名称使用带有 dict 理解的列表:

d = [{f'{k}{i}': v for i,y in enumerate(x, 1) for k,v in y.items()} for x in df['new']]

df = pd.DataFrame(d, index=df.index).sort_index(axis=1)
print (df)
  letters1 letters2 letters3  score1  score2  score3  temp1  temp2  temp3
0        C        B        F   0.998   1.343   2.323      1      0      1
1        D        G        S   2.532   2.123   4.332      1      1      0
2        C        J        G   2.223   1.144   7.443      0      1      9

d = [{f'{k}{i}': v for i,y in enumerate(x, 1) for k,v in y.items() if k in ['score','letters']} for x in df.pop('new')]

df = df.join(pd.DataFrame(d, index=df.index).sort_index(axis=1))
print (df)
   info letters1 letters2 letters3  score1  score2  score3
0   1.4        C        B        F   0.998   1.343   2.323
1   3.6        D        G        S   2.532   2.123   4.332
2   6.5        C        J        G   2.223   1.144   7.443

答案 1 :(得分:0)

试试这个

df = df.explode('new')
#pd.json_normalize(df['new'])
df2 = pd.concat([df['info'].reset_index(drop = True), pd.json_normalize(df['new'])], axis = 1)

explode 会将其从列表中取出,然后对字典进行规范化

答案 2 :(得分:0)

  • explode() 并展开列
  • 重塑爆炸和扩展的数据框并join()恢复原状
df2 = (
    df.explode("new")["new"]
    .apply(pd.Series)
    .pipe(lambda d: d.assign(obs=d.groupby(level=0).cumcount() + 1))
    .set_index("obs", append=True)
    .unstack("obs")
)

df2.columns = ["".join(map(str, c)) for c in df2.columns.to_flat_index()]
df.loc[:, ["info"]].join(df2)

<头>
信息 score1 score2 score3 letters1 letters2 letters3 temp1 temp2 temp3
0 1.4 0.998 1.343 2.323 C B F 1 0 1
1 3.6 2.532 2.123 4.332 D G S 1 1 0
2 6.5 2.223 1.144 7.443 C J G 0 1 9
相关问题