Question

我有一个熊猫数据框，如下所示：

speaker  Scarlett Johanson  Mark Ruffalo  Chris Evans
0                 0.790857      1.044091     0.984198
1                 0.895030      0.672590     1.072131
2                 0.925493      0.078618     0.800736
3                 0.296032      0.550027     0.978062
4                 0.669364      0.499356     0.940024

所以我要实现的是，如果行的最小值大于阈值，例如0.3，我希望将值作为“噪声”，否则我希望将列的名称作为值。

例如：第0行-> min为0.7，大于0.3，因此noise

第二行->最小值为0.07，小于0.3，因此值应为Mark Ruffalo。

我正在尝试将其放在新列中，例如'Final Result'。

我尝试过这样的事情：

d['final'] = np.where(d.min(axis=1) >= 0.3, 'noise', 'no_noise')

但不了解如何用column_header替换文本'no_noise'。预先感谢您提供的所有帮助。

Answer 1

解决方案1：df.idxmin ：

使用idxmin查找最小值索引，该索引将返回所请求的轴上第一次出现最小值的索引

# set speaker as index so it's out of the way
df.set_index('speaker', inplace=True)
# set your threshold
thresh = 0.3
# use np.where with `df.idxmin` as the other
df['final'] = np.where(df.min(1) > thresh, 'noise', df.idxmin(1))

>>> df
         Scarlett Johanson  Mark Ruffalo  Chris Evans              final
speaker                                                                 
0                 0.790857      1.044091     0.984198              noise
1                 0.895030      0.672590     1.072131              noise
2                 0.925493      0.078618     0.800736       Mark Ruffalo
3                 0.296032      0.550027     0.978062  Scarlett Johanson
4                 0.669364      0.499356     0.940024              noise

解决方案2：np.argmin 您可以使用np.argmin查找在哪里找到最小值，并通过调用np.where的结果将列名编入索引：

# set speaker as index so it's out of the way
df.set_index('speaker', inplace=True)   
# set your threshold
thresh = 0.3
# use np.where and np.argmin:
df['final'] = np.where(df.min(1) > thresh, 'noise', df.columns[np.argmin(df.values,1)])

>>> df
         Scarlett Johanson  Mark Ruffalo  Chris Evans              final
speaker                                                                 
0                 0.790857      1.044091     0.984198              noise
1                 0.895030      0.672590     1.072131              noise
2                 0.925493      0.078618     0.800736       Mark Ruffalo
3                 0.296032      0.550027     0.978062  Scarlett Johanson
4                 0.669364      0.499356     0.940024              noise

如果值低于阈值，则获取数据帧的标头，否则放入“噪声”

1 个答案: