Question

我有以下df概要：

  movie id       movie title release date                                           IMDb URL                      genre  user id  rating  
0         2  GoldenEye (1995)     1-Jan-95  http://us.imdb.com/M/title-exact?GoldenEye%20(...  Action|Adventure|Thriller        5       3  
1         2  GoldenEye (1995)     1-Jan-95  http://us.imdb.com/M/title-exact?GoldenEye%20(...  Action|Adventure|Thriller      268       2  
2         2  GoldenEye (1995)     1-Jan-95  http://us.imdb.com/M/title-exact?GoldenEye%20(...  Action|Adventure|Thriller      276       4  
3         2  GoldenEye (1995)     1-Jan-95  http://us.imdb.com/M/title-exact?GoldenEye%20(...  Action|Adventure|Thriller      217       3  
4         2  GoldenEye (1995)     1-Jan-95  http://us.imdb.com/M/title-exact?GoldenEye%20(...  Action|Adventure|Thriller       87       4

我正在寻找的是计算'用户ID'和平均'评级'并保持所有其他列完好无损。所以结果将是这样的：

  movie id       movie title release date                                           IMDb URL                      genre  user id     rating  
0         2  GoldenEye (1995)     1-Jan-95  http://us.imdb.com/M/title-exact?GoldenEye%20(...  Action|Adventure|Thriller      50       3.75  
1         3  Four Rooms (1995)    1-Jan-95  http://us.imdb.com/M/title-exact?GoldenEye%20(...  Action|Adventure|Thriller      35       2.34

任何想法如何做到这一点？

由于

Answer 1

如果所有值都在您聚合的列中，则对于每个组都是相同的，那么您可以通过将它们放入组中来避免连接。

然后将函数字典传递给agg。如果您将as_index设置为False，则按列保持按列分组：

df.groupby(['movie id','movie title','release date','IMDb URL','genre'], as_index=False).agg({'user id':len,'rating':'mean'})

注意len用于计算

pandas - 如何聚合两列并保留所有其他列

1 个答案: