Question

我从下面的数据框开始，每行都是一个新的试用版：

    test_group   range     success
0      test        1-5         1
1      test        1-5         0
2      test        1-5         1
3      test        6-10        1
4      test        6-10        0
5      test        6-10        0
6      control     1-5         0
7      control     1-5         0
8      control     1-5         1
9      control     6-10        1
10     control     6-10        1
11     control     6-10        1

我想计算平均成功值和按测试组和范围分组。

为此，我要编写以下代码：

df = df.groupby('test_group','range').success.mean()

我的结果如下所示

test_group    range
test          1-5    0.66
              6-10   0.33
control       1-5    0.33
              6-10   1.00

理想情况下，我希望我的最终输出看起来如下所示，以便我可以在同一个图表上绘制两个测试组，x轴是每个范围，y轴是成功率：

 test_group   range     success-rate
0      test        1-5         0.66
1      test        1-5         0.66
2      test        1-5         0.66
3      test        6-10        0.33
4      test        6-10        0.33
5      test        6-10        0.33
6      control     1-5         0.33
7      control     1-5         0.33
8      control     1-5         0.33
9      control     6-10        1.00
10     control     6-10        1.00
11     control     6-10        1.00

Answer 1

您可以使用transform()方法：

In [35]: df['success-rate'] = df.groupby(['test_group','range'])['success'].transform('mean')

In [36]: df
Out[36]:
   test_group range  success  success-rate
0        test   1-5        1      0.666667
1        test   1-5        0      0.666667
2        test   1-5        1      0.666667
3        test  6-10        1      0.333333
4        test  6-10        0      0.333333
5        test  6-10        0      0.333333
6     control   1-5        0      0.333333
7     control   1-5        0      0.333333
8     control   1-5        1      0.333333
9     control  6-10        1      1.000000
10    control  6-10        1      1.000000
11    control  6-10        1      1.000000

Groupby.transform()方法将聚合函数应用于所有原始行

将计算列添加到DF和绘图2行中

1 个答案: