我想输出一个数据框,显示每个列出类型的平均评分(基于“ AverageRating”)。
以下是该表的残缺版本:
MovieID Movie_title Action Adventure Animation AverageRating
1 Toy Story (1995) 0 0 1 3.878319
2 GoldenEye (1995) 1 1 0 3.206107
3 Four Rooms (1995) 0 0 0 3.033333
4 Get Shorty (1995) 1 0 0 3.550239
5 Copycat (1995) 0 0 0 3.302326
6 Shanghai Triad 0 0 0 3.576923
7 Twelve Monke(1995) 0 0 0 3.798469
8 Babe (1995) 0 0 0 3.995434
9 Dead Man W (1995) 0 0 0 3.896321
10 Richard III (1995) 0 0 0 3.831461
11 Seven (1995) 0 0 0 3.847458
12 Usual Suspec (1995) 0 0 0 4.385768
13 Mighty Aphro (1995) 0 0 0 3.418478
14 Postino, Il (1994) 0 0 0 3.967213
15 Mr. Holland's(1995) 0 0 0 3.778157
16 French Twist (1995) 0 0 0 3.205128
17 From Dusk Till 1 0 0 3.119565
18 White Balloon 0 0 0 2.800000
19 Antonia's Line 0 0 0 3.956522
20 Angels and Insects 0 0 0 3.416667
21 Muppet Treasure 1 1 0 2.761905
22 Braveheart (1995) 1 0 0 4.151515
23 Taxi Driver (1976) 0 0 0 4.120879
24 Rumble in the 1 1 0 3.448276
25 Birdcage 0 0 0 3.443686
因此,我需要拍摄Action = 1等的电影,计算这些电影的AverageRating平均值,然后创建一个数据框,例如:
AverageRating
Action 2.97
Adventure 3.14
Animation 3.30
使用pd.groupby,很容易完成一列:
df.groupby(['Action'])['AverageRating'].mean()
Action
0 3.095288
1 2.966332
Name: AverageRating, dtype: float64
...但是我正在努力研究如何以所需的方式一次处理多列。我知道我必须缺少一种简单的方法来做到这一点。任何帮助,不胜感激!
答案 0 :(得分:4)
IIUC,您可以将类别与AverageRating
相乘,然后对掩码取平均值:
cats = df.iloc[:, 2:-1]
s = cats.mul(df.AverageRating, axis='rows')
s.mask(s.eq(0)).mean()
# per comment, this is a better option
# s.mask(cats.eq(0)).mean()
输出:
Action 3.372934
Adventure 3.138763
Animation 3.878319
dtype: float64