Question

     rev_id  worker_id  toxicity  toxicity_score
0    2232.0        723         0             0.0
1    2232.0       4000         0             0.0
2    2232.0       3989         0             1.0
3    2232.0       3341         0             0.0
4    2232.0       1574         0             1.0
5    2232.0       1508         0             1.0
6    2232.0        772         0             1.0
7    2232.0        680         0             0.0
8    2232.0        405         0             1.0
9    2232.0       4020         1            -1.0
10   4216.0        500         0             0.0
11   4216.0        599         0             0.0
12   4216.0        339         0             2.0
13   4216.0        257         0             0.0
14   4216.0        303         0             1.0
15   4216.0        188         0             0.0
16   4216.0       1549         0             1.0
17   4216.0         64         0             1.0
18   4216.0       1527         0             0.0
19   4216.0       1502         0             0.0
20   8953.0       2596         0             1.0
21   8953.0       2403         0             0.0
22   8953.0       2539         0             0.0
23   8953.0       2542         0             0.0
24   8953.0       2544         0             0.0
25   8953.0       1016         0             0.0
26   8953.0       2550         0             0.0
27   8953.0       2578         0             0.0
28   8953.0       2494         0             0.0
29   8953.0        971         0             0.0

我想通过pandas获取来自toxicity的模式编号（1或0）和来自toxicity_score组的模式编号。我怎样才能做到这一点？感谢。

Answer 1

groupby mean和agg的汇总似乎需要mode：

df = (df.groupby('rev_id', as_index=False)
        .agg({'toxicity_score':'mean', 'toxicity': lambda x: x.mode()}))

替代方案是value_counts，其中选择索引的第一个值：

df = (df.groupby('rev_id', as_index=False)
        .agg({'toxicity_score':'mean', 'toxicity': lambda x: x.value_counts().index[0]}))

print (df)
   rev_id  toxicity_score  toxicity
0  2232.0             0.4         0
1  4216.0             0.5         0
2  8953.0             0.1         0

pandas按指定列获取平均值和模式组

1 个答案: