我有一个像这样的Pandas数据框:
Index, cat1, cat2, cat3
1,0.3,0.1,0.4
2,0.5,0.1,0.2
3,0.1,0.4,0.3
我想创建第五列" max_cat"使用具有最高值的类别的列名称,如下所示:
Index, cat1, cat2, cat3, max_cat
1,0.3,0.1,0.4, cat3
2,0.5,0.1,0.2, cat1
3,0.1,0.4,0.3, cat2
我如何实现这一目标,最好是使用熊猫?
这是我的代码:
import pandas as pd
from io import StringIO
data = StringIO("""
Index, cat1, cat2, cat3
1,0.3,0.1,0.4
2,0.5,0.1,0.2
3,0.1,0.4,0.3
""")
df = pd.read_csv(data, skiprows=1, header=0, names=["cat1","cat2","cat3"])
答案 0 :(得分:1)
如果您不担心关系,那么您可以在axis=1
列上使用idxmax
cati
:
>>> df['max_cat'] = df[['cat1', 'cat2', 'cat3']].idxmax(axis=1)
>>> df
Index cat1 cat2 cat3 max_cat
0 1 0.3 0.1 0.4 cat3
1 2 0.5 0.1 0.2 cat1
2 3 0.1 0.4 0.3 cat2
如果您想查看关系,请参阅this question。