Question

我有一个约50列的df：

Product ID | Cat1 | Cat2 |Cat3 | ... other columns ...
8937456       0      5     10
8497534       25     3     0
8754392       4      15    7

Cat表示该产品的数量属于一个类别。现在我想添加一个“Category”列，表示产品的大多数类别（忽略其他列，只考虑Cat列）。

df_goal：

Product ID | Cat1 | Cat2 |Cat3 | Category | ... other columns ...
8937456       0      5     10       3
8497534       25     3     0        1
8754392       4      15    7        2

我想我需要使用max并应用或映射？

我在stackoverflow上找到了那些，但他们没有解决类别分配问题。在Excel中，我将Cat 1中的列重命名为1并使用了index（match（max））。

Python Pandas max value of selected columns

How should I take the max of 2 columns in a dataframe and make it another column?

Assign new value in DataFrame column based on group max

Answer 1

这是使用numpy.argmax -

的NumPy方式

df['Category'] = df.values[:,1:].argmax(1)+1

要将选择限制为那些列，请专门使用这些列标题/名称然后使用idxmax，最后将字符串Cat替换为空字符串，如此 -

df['Category'] = df[['Cat1','Cat2','Cat3']].idxmax(1).str.replace('Cat','')

numpy.argmax或panda's idxmax基本上会获取沿轴的最大元素ID。

如果我们知道Cat列的列名称从1st列开始，到4th列结束，我们可以切片数据帧：{ {1}}代替df.iloc[:,1:4]。

Pandas（python）：列中的max定义新列中的新值

1 个答案: