将多列字典分解为pandas dataframe row

时间:2018-06-15 11:29:08

标签: python pandas dictionary dataframe

我有以下3列的pandas数据帧。其中两个是字典列表,因此我只想在字典中为actorname键分解这两列。

我试图采取以下方式:

代码:

 import pandas as pd
    df = (pd.DataFrame({'name': ['Hello', 'World', 'Test'], 
                        'cast': [ [ {"gender": 0, "id": 2423,"actor": "Bruno Delbonnel"},
                                 {"gender": 1, "id": 1234,  "actor": "Alex"} ] ,
                                {"gender": 1, "id": 2424, "actor": "Stuart"},
                                {"gender": 2, "id": 2425, "actor": "Kate"}  ], 
                        'genre': [ {"id": 2343, "name": "magic"},
 [{"id": 616, "name": "witch"}, {"id": 2765, "name": "wizardry"}],                         
                                {"id": 3872, "name": "broom"}] } )
                                .set_index(['name']))

    df.reset_index(inplace=True)
    output = []

    _ = df.apply(lambda row: [output.append([row['name'], row['cast']['actor'], row['genre']['name'] ]) 

                                  ], axis=1)

    df_new = pd.DataFrame(output, columns=['name', 'cast', 'genre'])

DataFframe:

  {'name': ['Hello', 'World', 'Test'], 
                        'cast': [ [ {"gender": 0, "id": 2423,"actor": "Bruno Delbonnel"},
                                 {"gender": 1, "id": 1234,  "actor": "Alex"} ] ,
                                {"gender": 1, "id": 2424, "actor": "Stuart"},
                                {"gender": 2, "id": 2425, "actor": "Kate"}  ], 
                        'genre': [ {"id": 2343, "name": "magic"},
                                   [{"id": 616, "name": "witch"}, {"id": 2765, "name": "wizardry"}],
                                {"id": 3872, "name": "broom"}] }

输出:

 name            cast               genre
Hello         Bruno Delbonnel      magic
Hello            Alex              magic
World            Stuart            witch
World            Stuart            wizardry
Test             Kate              broom

但由于它是一个字典列表,我无法做row['cast']['actor'], row['genre']['name']。那么,如何实现呢?

1 个答案:

答案 0 :(得分:1)

您可以应用函数来处理系列中的列表或词典元素。

然后,要扩展您的数据框,请根据需要重复或链接项目。

from itertools import chain
import numpy as np

def get_val(x, var):
    if not isinstance(x, list):
        return [x[var]]
    else:
        return [i[var] for i in x]

df['cast'] = df['cast'].apply(get_val, var='actor')
df['genre'] = df['genre'].apply(get_val, var='name')

res = pd.DataFrame({'name': np.repeat(df['name'], df['cast'].map(len)),
                    'cast': list(chain.from_iterable(df['cast'])),
                    'genre': list(chain.from_iterable(df['genre']))})

print(res)

              cast     genre   name
0  Bruno Delbonnel     witch  Hello
0             Alex  wizardry  Hello
1           Stuart     magic  World
2             Kate     broom   Test