Question

我有形状(4096,4096)的numpy数组/矩阵和应该设置为零的元素数组。我发现函数numpy.in1d工作正常，但我的计算速度很慢。我想知道是否存在一些更快的执行方式，因为我需要在非常多的矩阵上重复这一点，所以每次优化都是有帮助的。

以下是示例：

numpy数组如下所示：

npArr = np.array([
    [1, 4, 5, 5, 3],
    [2, 5, 6, 6, 1],
    [0, 0, 1, 0, 0],
    [3, 3, 2, 4, 3]])

另一个数组是：

arr = np.array([3,5,8])

numpy数组npArr应该在替换之后显示：

array([[ 1,  4,  0,  0,  0],
       [ 2,  0,  6,  6,  1],
       [ 0,  0,  1,  0,  0],
       [ 0,  0,  2,  4,  0]])

Answer 1

这是使用np.searchsorted -

的替代方案

def in1d_alternative_2D(npArr, arr):
    idx = np.searchsorted(arr, npArr.ravel())
    idx[idx==len(arr)] = 0
    return arr[idx].reshape(npArr.shape) == npArr

它假定arr被排序。如果不是，我们需要排序然后使用发布的方法。

示例运行 -

In [90]: npArr = np.array([[1, 4, 5, 5, 3],
    ...:     [2, 5, 6, 6, 1],
    ...:     [0, 0, 1, 0, 0],
    ...:     [3, 3, 2, 14, 3]])
    ...: 
    ...: arr = np.array([3,5,8])
    ...: 

In [91]: in1d_alternative_2D(npArr, arr)
Out[91]: 
array([[False, False,  True,  True,  True],
       [False,  True, False, False, False],
       [False, False, False, False, False],
       [ True,  True, False, False,  True]], dtype=bool)

In [92]: npArr[in1d_alternative_2D(npArr, arr)] = 0

In [93]: npArr
Out[93]: 
array([[ 1,  4,  0,  0,  0],
       [ 2,  0,  6,  6,  1],
       [ 0,  0,  1,  0,  0],
       [ 0,  0,  2, 14,  0]])

针对numpy.in1d

进行基准测试

使用np.in1d的等效解决方案是：

np.in1d(npArr, arr).reshape(npArr.shape)

让我们提议的人反对它，并验证问题中提到的尺寸的结果。

In [85]: # (4096, 4096) shaped 'npArr' and search array 'arr' of 1000 elems
    ...: npArr = np.random.randint(0,10000,(4096,4096))
    ...: arr = np.sort(np.random.choice(10000, 1000, replace=0 ))
    ...: 

In [86]: out1 = np.in1d(npArr, arr).reshape(npArr.shape)
    ...: out2 = in1d_alternative_2D(npArr, arr)
    ...: 

In [87]: np.allclose(out1, out2)
Out[87]: True

In [88]: %timeit np.in1d(npArr, arr).reshape(npArr.shape)
1 loops, best of 3: 3.04 s per loop

In [89]: %timeit in1d_alternative_2D(npArr, arr)
1 loops, best of 3: 1 s per loop

Answer 2

如果你有numba，你可以使用不需要中间面具的自定义功能来解决这个问题：

import numpy as np
import numba as nb

@nb.njit
def replace_where(arr, needle, replace):
    arr = arr.ravel()
    needles = set(needle)
    for idx in range(arr.size):
        if arr[idx] in needles:
            arr[idx] = replace

这为您的示例提供了正确的结果：

npArr = np.array([[1, 4, 5, 5, 3],
                  [2, 5, 6, 6, 1],
                  [0, 0, 1, 0, 0],
                  [3, 3, 2, 4, 3]])

arr = np.array([3,5,8])

replace_where(npArr, arr, 0)
print(npArr)
# array([[1, 4, 0, 0, 0],
#        [2, 0, 6, 6, 1],
#        [0, 0, 1, 0, 0],
#        [0, 0, 2, 4, 0]])

它应该真的非常快。我为它定时了几个数组大小，它比arr快5到20倍（取决于大小，特别是np.in1d大小）。

Answer 3

使用numpy广播的另一种解决方案：

np.min(np.where(npArr[None,:,:] == arr[:,None,None], 0, a),0)
Out[730]: 
array([[1, 4, 0, 0, 0],
       [2, 0, 6, 6, 1],
       [0, 0, 1, 0, 0],
       [0, 0, 2, 4, 0]])

在numpy数组中将特定值设置为零

3 个答案: