Question

我喜欢np.where，但从未完全掌握它。

我有一个数据框，让我们说它看起来像这样：

import pandas as pd
import numpy as np
from numpy import nan as NA
DF = pd.DataFrame({'a' : [ 3, 0, 1, 0, 1, 14, 2, 0, 0, 0, 0],
                   'b' : [ 3, 0, 1, 0, 1, 14, 2, 0, 0, 0, 0],
                   'c' : [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                   'd' : [5, 1, 2 ,1, 1 ,22, 30, 1, 0, 0, 0]})

现在我要做的是在所有行值为零时用NaN值替换0值。关键是我想在所有行值都不为零的情况下维护行中的其他任何值。

我想做这样的事情：

cols = ['a', 'b', 'c', 'd']
condition = (DF[cols] == 0).all(axis=1)
for col in cols:
    DF[col] = np.where(condition, NA, ???)

我把???表示如果条件为False，我不知道放在那里的值是什么，我只想保留那里已有的东西。这可能与np.where一起使用，还是应该使用其他技术？

Answer 1

对于这种任务，有pandas.Series方法（where偶然）。起初看起来有点落后，但是来自文档。

Series.where（cond，other = nan，inplace = False，axis = None，level = None，   try_cast = False，raise_on_error = True）

返回形状相同的物体   self和其对应的条目来自self，其中cond为True   或者来自其他人。

所以，你的例子将成为

cols = ['a', 'b', 'c', 'd']
condition = (DF[cols] == 0).all(axis=1)
for col in cols:
    DF[col].where(~condition, np.nan, inplace=True)

但是，如果你要做的就是用NA替换特定列的所有零行，你可以这样做

DF.loc[condition, cols] = NA

修改

要回答原始问题，np.where遵循与其他数组操作相同的broadcasting rules，因此您将???替换为DF[col]，将您的示例更改为：

cols = ['a', 'b', 'c', 'd'] condition = (DF[cols] == 0).all(axis=1) for col in cols: DF[col] = np.where(condition, NA, DF[col])

Answer 2

建议的解决方案可以工作，但是对于numpy数组，有一种更简单的方法而不使用DataFrame。

解决方案是： np_array[np.where(condition)] = value_of_condition_true_rows

Answer 3

您可以执行以下操作：

    array_binary = np.where(array[i]<threshold,0,1)
    array_sparse = np.multiply(array_binary,np.ones_like(array))

使用np.multiply对二进制数组和一个数组进行逐元素乘法。因此，非零元素将被恢复/维护。 array_sparse是数组的稀疏版本

使用np.where但在条件为False时保持现有值

3 个答案: