Question

我关注df -

id            score
222.0         0.0           
222.0         0.0           
222.0         1.0           
222.0         0.0           
222.0         1.0           
222.0         1.0           
222.0         1.0           
222.0         0.0           
222.0         1.0           
222.0        -1.0           
416.0         0.0           
416.0         0.0           
416.0         2.0           
416.0         0.0           
416.0         1.0           
416.0         0.0           
416.0         1.0           
416.0         1.0           
416.0         0.0           
416.0         0.0           
895.0         1.0           
895.0         0.0           
895.0         0.0           
895.0         0.0           
895.0         0.0           
895.0         0.0           
895.0         0.0           
895.0         0.0           
895.0         0.0           
895.0         0.0

我想为id列的score的相同值计算模式。像这样的东西 -

id            score
222.0         1.0           
416.0         0.0           
895.0         0.0

我试过这样 -

df['score'] = df.mode()['score']

但是我得到了以下输出 -

id            score
222.0         0.0           
222.0         NaN           
222.0         NaN           
222.0         NaN           
222.0         NaN           
222.0         NaN           
222.0         NaN           
222.0         NaN           
222.0         NaN           
222.0         NaN           
416.0         NaN           
416.0         NaN           
416.0         NaN          
416.0         NaN           
416.0         NaN           
416.0         NaN           
416.0         NaN           
416.0         NaN           
416.0         NaN           
416.0         NaN           
895.0         NaN           
895.0         NaN           
895.0         NaN           
895.0         NaN           
895.0         NaN           
895.0         NaN           
895.0         NaN           
895.0         NaN           
895.0         NaN           
895.0         NaN

这里有什么问题？

Answer 1

按ID分组，并将模式应用于每个：

>>> df.score.groupby(df['id']).apply(lambda g: g.mode()).reset_index()[['id', 'score']]
      id    score
0   222.0   1.0
1   416.0   0.0
2   895.0   0.0

Answer 2

您也可以使用

In [79]: df.groupby('id').agg({'score': lambda x: x.value_counts().index[0]}).reset_index()
Out[79]:
      id  score
0  222.0    1.0
1  416.0    0.0
2  895.0    0.0

或者，使用

In [80]: from scipy.stats.mstats import mode

In [81]: df.groupby('id').agg({'score': lambda x: mode(x)[0]}).reset_index()
Out[81]:
      id  score
0  222.0    1.0
1  416.0    0.0
2  895.0    0.0

使用具有相同行值的其他列计算Pandas中列的模式

2 个答案: