pandas - 如何聚合两列并保留所有其他列

时间:2015-02-13 17:51:55

标签: python pandas

我有以下df概要:

  movie id       movie title release date                                           IMDb URL                      genre  user id  rating  
0         2  GoldenEye (1995)     1-Jan-95  http://us.imdb.com/M/title-exact?GoldenEye%20(...  Action|Adventure|Thriller        5       3  
1         2  GoldenEye (1995)     1-Jan-95  http://us.imdb.com/M/title-exact?GoldenEye%20(...  Action|Adventure|Thriller      268       2  
2         2  GoldenEye (1995)     1-Jan-95  http://us.imdb.com/M/title-exact?GoldenEye%20(...  Action|Adventure|Thriller      276       4  
3         2  GoldenEye (1995)     1-Jan-95  http://us.imdb.com/M/title-exact?GoldenEye%20(...  Action|Adventure|Thriller      217       3  
4         2  GoldenEye (1995)     1-Jan-95  http://us.imdb.com/M/title-exact?GoldenEye%20(...  Action|Adventure|Thriller       87       4  

我正在寻找的是计算'用户ID'和平均'评级'并保持所有其他列完好无损。所以结果将是这样的:

  movie id       movie title release date                                           IMDb URL                      genre  user id     rating  
0         2  GoldenEye (1995)     1-Jan-95  http://us.imdb.com/M/title-exact?GoldenEye%20(...  Action|Adventure|Thriller      50       3.75  
1         3  Four Rooms (1995)    1-Jan-95  http://us.imdb.com/M/title-exact?GoldenEye%20(...  Action|Adventure|Thriller      35       2.34  

任何想法如何做到这一点?

由于

1 个答案:

答案 0 :(得分:6)

如果所有值都在您聚合的列中,则对于每个组都是相同的,那么您可以通过将它们放入组中来避免连接。

然后将函数字典传递给agg。如果您将as_index设置为False,则按列保持按列分组:

df.groupby(['movie id','movie title','release date','IMDb URL','genre'], as_index=False).agg({'user id':len,'rating':'mean'})

注意len用于计算