Question

我有一个Pandas数据帧，我想从一组其他列中获得具有最高幅度值的新列（即，最正或最负，-4将被挑选为2）。我可以使用.abs（）。idxmax来查找绝对值最大的列：

df[["col A","col B"]].abs().idxmax(axis=1)

这给了我一个很好的数组，如果需要可以转换为列表，[[col A“，”col B“..]等等，显示哪个列具有最大幅度，但是如何使用它来构造来自指定位置的值的新列？

这不起作用：

df["newcol"] = df[df[["col A","col B"]].abs().idxmax(axis=1)]

即使我在其中放置.loc或使用上一个输出中的值或值列表。是否有一个pandas本地（非循环）方法来构建一个新列，其中每列的值由列名列表选取？

Answer 1

而不是idxmax，使用np.where会更容易（也更快）：

condition = df['col A'].abs() >= df['col B'].abs()
df['new col'] = np.where(condition, df['col A'], df['col B'])

import numpy as np
import pandas as pd
np.random.seed(2015)
df = pd.DataFrame(np.random.randint(10, size=(10,2))-5, columns=['col A', 'col B'])
condition = df['col A'].abs() >= df['col B'].abs()
df['new col'] = np.where(condition, df['col A'], df['col B'])

产量

   col A  col B  new col
0     -3     -3       -3
1      4      1        4
2      3      0        3
3      2      3        3
4     -5      1       -5
5      2      3        3
6     -2      3        3
7      1      4        4
8     -3     -2       -3
9     -4     -3       -4

如果从idx = df[["col A","col B"]].abs().idxmax(axis=1)开始是一项要求，然后您可以选择所需的值通过从idx.index和idx.values创建MultiIndex，然后使用df.stack().loc[idx]选择值：

import numpy as np
import pandas as pd
np.random.seed(2015)
df = pd.DataFrame(np.random.randint(10, size=(10,2))-5, columns=['col A', 'col B'])

idx = df[["col A","col B"]].abs().idxmax(axis=1)
idx = pd.MultiIndex.from_arrays([idx.index, idx.values])
df['new_col'] = df.stack().loc[idx].values

产生与上述相同的结果。

Python Pandas：使用数组为新列的每个值选择不同的列

1 个答案: