如何确定每行多列中最高出现的分类标签

时间:2018-03-30 06:43:36

标签: python pandas

我正在尝试确定多个列中出现次数最多的标签名称,并使用该标签设置另一个pandas列。

例如,给定此数据框:

    Class_1     Class_2     Class_3
0   versicolor  setosa      setosa
1   virginica   versicolor  virginica
2   virginica   setosa      setosa
3   versicolor  setosa      setosa
4   versicolor  versicolor  virginica

我想根据上面的推理添加一个名为Predictions的列:

    Class_1     Class_2     Class_3    Predictions
0   versicolor  setosa      setosa     setosa
1   virginica   versicolor  virginica  virginica
2   virginica   setosa      setosa     setosa
3   versicolor  setosa      setosa     setosa
4   versicolor  versicolor  virginica  versicolor

1 个答案:

答案 0 :(得分:2)

使用value_counts按行applyaxis=1的每行最常见值返回第一个索引:

df['Predictions'] = df.apply(lambda x: x.value_counts().index[0], axis=1)
print (df)
      Class_1     Class_2    Class_3 Predictions
0  versicolor      setosa     setosa      setosa
1   virginica  versicolor  virginica   virginica
2   virginica      setosa     setosa      setosa
3  versicolor      setosa     setosa      setosa
4  versicolor  versicolor  virginica  versicolor

替代Counter.most_common

from collections import Counter

df['Predictions'] = [Counter(x).most_common(1)[0][0] for x in df.itertuples()]
print (df)
      Class_1     Class_2    Class_3 Predictions
0  versicolor      setosa     setosa      setosa
1   virginica  versicolor  virginica   virginica
2   virginica      setosa     setosa      setosa
3  versicolor      setosa     setosa      setosa
4  versicolor  versicolor  virginica  versicolor