我有以下3列的pandas数据帧。其中两个是字典列表,因此我只想在字典中为actor
和name
键分解这两列。
我试图采取以下方式:
代码:
import pandas as pd
df = (pd.DataFrame({'name': ['Hello', 'World', 'Test'],
'cast': [ [ {"gender": 0, "id": 2423,"actor": "Bruno Delbonnel"},
{"gender": 1, "id": 1234, "actor": "Alex"} ] ,
{"gender": 1, "id": 2424, "actor": "Stuart"},
{"gender": 2, "id": 2425, "actor": "Kate"} ],
'genre': [ {"id": 2343, "name": "magic"},
[{"id": 616, "name": "witch"}, {"id": 2765, "name": "wizardry"}],
{"id": 3872, "name": "broom"}] } )
.set_index(['name']))
df.reset_index(inplace=True)
output = []
_ = df.apply(lambda row: [output.append([row['name'], row['cast']['actor'], row['genre']['name'] ])
], axis=1)
df_new = pd.DataFrame(output, columns=['name', 'cast', 'genre'])
DataFframe:
{'name': ['Hello', 'World', 'Test'],
'cast': [ [ {"gender": 0, "id": 2423,"actor": "Bruno Delbonnel"},
{"gender": 1, "id": 1234, "actor": "Alex"} ] ,
{"gender": 1, "id": 2424, "actor": "Stuart"},
{"gender": 2, "id": 2425, "actor": "Kate"} ],
'genre': [ {"id": 2343, "name": "magic"},
[{"id": 616, "name": "witch"}, {"id": 2765, "name": "wizardry"}],
{"id": 3872, "name": "broom"}] }
输出:
name cast genre
Hello Bruno Delbonnel magic
Hello Alex magic
World Stuart witch
World Stuart wizardry
Test Kate broom
但由于它是一个字典列表,我无法做row['cast']['actor'], row['genre']['name']
。那么,如何实现呢?
答案 0 :(得分:1)
您可以应用函数来处理系列中的列表或词典元素。
然后,要扩展您的数据框,请根据需要重复或链接项目。
from itertools import chain
import numpy as np
def get_val(x, var):
if not isinstance(x, list):
return [x[var]]
else:
return [i[var] for i in x]
df['cast'] = df['cast'].apply(get_val, var='actor')
df['genre'] = df['genre'].apply(get_val, var='name')
res = pd.DataFrame({'name': np.repeat(df['name'], df['cast'].map(len)),
'cast': list(chain.from_iterable(df['cast'])),
'genre': list(chain.from_iterable(df['genre']))})
print(res)
cast genre name
0 Bruno Delbonnel witch Hello
0 Alex wizardry Hello
1 Stuart magic World
2 Kate broom Test