我正在尝试确定多个列中出现次数最多的标签名称,并使用该标签设置另一个pandas列。
例如,给定此数据框:
Class_1 Class_2 Class_3
0 versicolor setosa setosa
1 virginica versicolor virginica
2 virginica setosa setosa
3 versicolor setosa setosa
4 versicolor versicolor virginica
我想根据上面的推理添加一个名为Predictions的列:
Class_1 Class_2 Class_3 Predictions
0 versicolor setosa setosa setosa
1 virginica versicolor virginica virginica
2 virginica setosa setosa setosa
3 versicolor setosa setosa setosa
4 versicolor versicolor virginica versicolor
答案 0 :(得分:2)
使用value_counts
按行apply
和axis=1
的每行最常见值返回第一个索引:
df['Predictions'] = df.apply(lambda x: x.value_counts().index[0], axis=1)
print (df)
Class_1 Class_2 Class_3 Predictions
0 versicolor setosa setosa setosa
1 virginica versicolor virginica virginica
2 virginica setosa setosa setosa
3 versicolor setosa setosa setosa
4 versicolor versicolor virginica versicolor
from collections import Counter
df['Predictions'] = [Counter(x).most_common(1)[0][0] for x in df.itertuples()]
print (df)
Class_1 Class_2 Class_3 Predictions
0 versicolor setosa setosa setosa
1 virginica versicolor virginica virginica
2 virginica setosa setosa setosa
3 versicolor setosa setosa setosa
4 versicolor versicolor virginica versicolor