我有一个数据帧df,其中有两列名为' MovieName'和演员'它看起来像:
MovieName Actors
lights out Maria Bello
legend Tom Hardy*Emily Browning*Christopher Eccleston*David Thewlis
请注意,不同的演员姓名由' *'分隔。我有另一个名为gender.csv的csv文件,该文件具有基于其名字的所有actor的性别。 gender.csv看起来像 -
ActorName Gender
Tom male
Emily female
Christopher male
我想在我的数据框中添加两列' female_actors'和' male_actors'其中分别包含该特定电影中女性和男性演员的数量。
如何在pandas中使用df和gender.csv来完成此任务?
请注意 -
以上示例的结果应为 -
MovieName Actors male_actors female_actors
lights out Maria Bello 0 0
legend Tom Hardy*Emily Browning*Christopher Eccleston*David Thewlis 2 1
答案 0 :(得分:3)
import pandas as pd
df1 = pd.DataFrame({'MovieName': ['lights out', 'legend'], 'Actors':['Maria Bello', 'Tom Hardy*Emily Browning*Christopher Eccleston*David Thewlis']})
df2 = pd.DataFrame({'ActorName': ['Tom', 'Emily', 'Christopher'], 'Gender':['male', 'female', 'male']})
def func(actors, gender):
actors = [act.split()[0] for act in actors.split('*')]
n_gender = df2.Gender[df2.Gender==gender][df2.ActorName.isin(actors)].count()
return n_gender
df1['male_actors'] = df1.Actors.apply(lambda x: func(x, 'male'))
df1['female_actors'] = df1.Actors.apply(lambda x: func(x, 'female'))
df1.to_csv('res.csv', index=False)
print df1
输出
Actors,MovieName,male_actors,female_actors
Maria Bello,lights out,0,0
Tom Hardy*Emily Browning*Christopher Eccleston*David Thewlis,legend,2,1