Question

我有一个这样的熊猫数据框

  Windows Linux Mac
0 True    False False
1 False   True  False
2 False   False True

我想将这三列合并为一个这样的列

  OS
0 Windows
1 Linux
2 Mac

我知道我可以编写一个像这样的简单函数

def aggregate_os(row):
   if row['Windows'] == True:
      return 'Windows'
   if row['Linux'] == True:
      return 'Linux'
   if row['Mac'] == True:
      return 'Mac'

我可以这样称呼

df['OS'] = df.apply(aggregate_os, axis=1)

问题是我的数据集很大，并且此解决方案太慢。有没有更有效的方法来进行这种聚合？

Answer 1

`idxmax`

df.idxmax(1).to_frame('OS')

        OS
0  Windows
1    Linux
2      Mac

`np.select`

pd.DataFrame(
    {'OS': np.select([*map(df.get, df)], [*df])},
    df.index
)

        OS
0  Windows
1    Linux
2      Mac

`dot`

df.dot(df.columns).to_frame('OS')

        OS
0  Windows
1    Linux
2      Mac

`np.where`

假设每行仅一个True

pd.DataFrame(
   {'OS': df.columns[np.where(df)[1]]},
    df.index
)

        OS
0  Windows
1    Linux
2      Mac

Answer 2

将boolean indexing与stack和rename一起使用

df_new = df.stack()
df_new[df_new].reset_index(level=1).rename(columns={'level_1':'OS'}).drop(columns=0)

输出

        OS
0  Windows
1    Linux
2      Mac

在熊猫的单个列中融化多个布尔列

2 个答案:

`idxmax`

`np.select`

`dot`

`np.where`