Question

我想在ndarray中找到包含满足条件的所有值的最小尺寸的二维ndarray。

例如：假设我有阵列

x = np.array([[1, 1, 5, 3, 11, 1],
    [1, 2, 15, 19, 21, 33],
    [1, 8, 17, 22, 21, 31],
    [3, 5, 6,  11, 23, 19]])

并致电f(x, x % 2 == 0) 然后程序的返回值将是数组

[[2, 15, 19]
 [8, 17, 22]
 [5, 6, 11]]

因为它是包含所有偶数（条件）的最小矩形数组。

我找到了一种方法，通过使用np.argwhere然后从原始数组中从最小到最大索引切片来找到条件为真的所有索引，并且我已经使用{ {1}}但我想知道是否有更有效的方法来使用numpy或scipy。

我目前的方法：

for loop

Answer 1

该功能已经非常有效 - 但你可以做得更好。

我们可以将条件折叠到每个轴（使用逻辑OR的减少）并找到第一个/最后一个索引，而不是检查每个行/列的条件，然后找到最小值和最大值，

def f2(arr, cond_arr):
    c0 = np.where(np.logical_or.reduce(cond_arr, axis=0))[0]
    c1 = np.where(np.logical_or.reduce(cond_arr, axis=1))[0]    
    return arr[c0[0]:c0[-1] + 1, c1[0]:c1[-1] + 1]

工作原理：

示例数据cond_array如下所示：

>>> (x%2==0).astype(int)
array([[0, 0, 0, 0, 0, 0],
       [0, 1, 0, 0, 0, 0],
       [0, 1, 0, 1, 0, 0],
       [0, 0, 1, 0, 0, 0]])

这是列条件：

>>> np.logical_or.reduce(cond_arr, axis=0).astype(int)
array([0, 1, 1, 1, 0, 0])

这就是行条件：

>>> np.logical_or.reduce(cond_arr, axis=).astype(int)
array([0, 1, 1, 1])

现在我们只需要为两个数组中的每一个找到第一个/最后一个非零元素。

它真的更快吗？

%timeit f(x, x%2 == 0)   #  10000 loops, best of 3: 24.6 µs per loop
%timeit f2(x, x%2 == 0)  # 100000 loops, best of 3: 12.6 µs per loop

嗯，有点......但它确实闪耀着更大的阵列：

x = np.random.randn(1000, 1000)
c = np.zeros((1000, 1000), dtype=bool)
c[400:600, 400:600] = True

%timeit f(x,c)   #  100 loops, best of 3: 5.28 ms per loop
%timeit f2(x,c)  # 1000 loops, best of 3: 225 µs per loop

最后，这个版本的开销略高，但在维数上是通用的：

def f3(arr, cond_arr):
    s = []
    for a in range(arr.ndim):
        c = np.where(np.logical_or.reduce(cond_arr, axis=a))[0]
        s.append(slice(c[0], c[-1] + 1))
    return arr[s]

最小尺寸的Numpy数组（裁剪）

1 个答案: