计算有多少演员与其他演员/演员合作

时间:2018-11-10 06:37:59

标签: python pandas dataframe

我有一个数据框:

title     |      cast 
------------------------------
movie1    |  cast1,cast2,cast3
movie2    |  cast4,cast1,cast6,cast7
movie3    |  cast4,cast3,cast5

pd.DataFrame({'movie': ['movie1','movie2','movie3'], 'cast': ['cast1,cast2,cast3','cast4,cast1,cast6,cast7','cast4,cast3,cast5']})

所以,我想得到的结果是:

cast   |      count
------------------------------
cast1  |  5 
cast2  |  2
cast3  |  4
cast4  |  5
cast5  |  2
cast6  |  3
cast7  |  3

为此,

df_cast = df.join(df.cast
              .str.strip(',')
              .str.split(',',expand=True)
              .stack()
              .reset_index(level=1,drop=True)
              .rename('cast_member')).reset_index(drop=True)

这将添加一个新列cast_member,其中每个单元格中只有一个转换成员名称。我尝试使用groupby('cast_member'),但是我不确定之后如何进行。

enter image description here

我是熊猫的新手,所以即使答案很简单,我也非常感谢。

1 个答案:

答案 0 :(得分:3)

GroupBy.transform用于新的列,并首先按movie进行计数:

df_cast['cast_count'] = df_cast.groupby('movie')['movie'].transform('size')
print (df_cast)
    movie                     cast cast_member   cast_count
0  movie1        cast1,cast2,cast3       cast1            3
1  movie1        cast1,cast2,cast3       cast2            3
2  movie1        cast1,cast2,cast3       cast3            3
3  movie2  cast4,cast1,cast6,cast7       cast4            4
4  movie2  cast4,cast1,cast6,cast7       cast1            4
5  movie2  cast4,cast1,cast6,cast7       cast6            4
6  movie2  cast4,cast1,cast6,cast7       cast7            4
7  movie3        cast4,cast3,cast5       cast4            3
8  movie3        cast4,cast3,cast5       cast3            3
9  movie3        cast4,cast3,cast5       cast5            3

然后将sizecast_count中的sum相加,并减去最后的count

df = df_cast.groupby('cast_member')['cast_count'].agg(['size','sum'])
df1 = df['sum'].sub(df['size']).rename('count').reset_index()
print (df1)
  cast_member  count
0       cast1      5
1       cast2      2
2       cast3      4
3       cast4      5
4       cast5      2
5       cast6      3
6       cast7      3