简单的一线解决方案

Question

我有一个3通道的numpy数组，我想对每个像素应用一个函数。具体来说，我想处理图像并返回灰度图像，以突出显示图像中出现特定颜色的位置。如果红色，绿色，蓝色通道距离颜色L2的距离在10范围内：（30,70,130），则将该像素在灰度图像上的值设置为255，否则设置为0。

我当前的操作步骤是：

def L2_dist(p1,p2):
    dist = ( (p1[0]-p2[0] )**2 + (p1[1]-p2[1] )**2 + (p1[2]-p2[2] )**2 ) **0.5
    if dist<10: return 255
    return 0

def colour_img(image):
    colour = my_colour
    img_dim = image.shape
    new_img = np.zeros((img_dim[0],img_dim[1])) # no alpha channel replica
    for c in range(img_dim[0]):
        for r in range(img_dim[1]):
            pixel = image[r,c,:3]
            new_img[r,c] = L2_dist(colour,pixel)
    return new_img

但是它非常慢。我该如何更快地执行此操作而不是使用循环？

Answer 1

简单的一线解决方案

您可以像这样在一行中完成所需的操作：

new_img = (((image - color)**2).sum(axis=2)**.5 <= 10) * 255

优化的两行解决方案

以上行并不是执行OP所需的所有操作的最有效方法。这是一种明显更快的方法（感谢Paul Panzer在评论中提出优化建议，但不能保证可读性）：

d = image - color
new_img = (np.einsum('...i, ...i', d, d) <= 100) * 255

时间：

给出一些100x100像素的测试数据：

import numpy as np

color = np.array([30, 70, 130])
# random data within [20,60,120]-[40,80,140] for demo purposes
image = np.random.randint(10*2 + 1, size=[100,100,3]) + color - 10

这里是OP的方法时序与此答案的解决方案的比较。一站式解决方案比OP解决方案快约100倍，而完全优化的解决方案则比OP快约300倍：

%%timeit
# OP's code
img_dim = image.shape
new_img = np.zeros((img_dim[0],img_dim[1])) # no alpha channel replica
for c in range(img_dim[0]):
    for r in range(img_dim[1]):
        pixel = image[r,c,:3]
        new_img[r,c] = L2_dist(color,pixel)

43.8 ms ± 502 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit
# one line solution
new_img = (((image - color)**2).sum(axis=2)**.5 <= 10) * 255

439 µs ± 13.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%%timeit
# fully optimized solution
d = image - color
new_img = (np.einsum('...i, ...i', d, d) <= 100) * 255

145 µs ± 2.29 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

简单的一线解决方案的说明

第一种解决方案是简单的单线：

找到image（将是形状为(m, n, 3)的数组中的每个像素和color（将是形状{{1 }}。
检查这些距离中的任何一个是否在(3)之内，并在满足条件的任何地方返回10的布尔数组，否则返回True。
< / li>
布尔数组实际上只是False和0的数组，因此我们将布尔数组乘以1以获得所需的最终结果。

优化解决方案的说明

以下是使用的优化列表：

使用255计算距离计算所需的平方和。在后台，einsum利用了Numpy包装的BLAS库来计算所需的求和积，因此它应该更快。
通过比较距离的平方与阈值的平方来跳过平方根。
我试图找到一种最小化数组分配/复制的方法，但这实际上使事情变慢了。这是优化解决方案的一个版本，该版本恰好分配了两个数组（一个分配给中间结果，一个分配给最终结果），并且不进行其他复制：
```
einsum
```

Answer 2

您可以这样做

color = np.array([30, 70, 130])
L2 = np.sqrt(np.sum((image - color) ** 2, axis=2))  # L2 distance of each pixel from color

img_dim = image.shape
new_img = np.zeros((img_dim[0], img_dim[1]))
new_img[L2 < 10] = 255

但是，正如您所看到的，我们遍历数组两次，首先计算L2，然后在L2 < 10中进行阈值处理，我们可以像在代码中那样改进它，方法如下：嵌套循环。但是，python中的循环很慢。因此，JIT编译功能以获得最快的版本。下面我用numba：

import numba as nb

@nb.njit(cache=True)
def L2_dist(p1,p2):
    dist = (p1[0]-p2[0] )**2 + (p1[1]-p2[1] )**2 + (p1[2]-p2[2] )**2
    if dist < 100: return 255
    return 0

@nb.njit(cache=True)
def color_img(image):
    n_rows, n_cols, _ = image.shape
    new_img = np.zeros((n_rows, n_cols), dtype=np.int32)
    for c in range(n_rows):
        for r in range(n_cols):
            pixel = image[r, c, :3]
            new_img[r,c] = L2_dist(color,pixel)
    return new_img

时间：

# @tel's fully optimised solution(using einsum to short circuit np to get to BLAS directly, the no sqrt trick)
128 µs ± 6.94 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

# JITed version without the sqrt trick
30.8 µs ± 10.2 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

# JITed version with the sqrt trick
24.8 µs ± 11.9 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

HTH。

如何遍历数组将阈值应用于每个像素

2 个答案:

简单的一线解决方案

优化的两行解决方案

时间：

简单的一线解决方案的说明

优化解决方案的说明