Question

我正在尝试使用一些 numpy 操作编写一个关于图像处理的包。我观察到嵌套循环内的操作代价高昂，并希望加快速度。

输入是一个 512 x 1024 的图像并被预处理成一个边集，它是每个数组 i 的 (Ni,2) ndarrays 列表。

接下来，嵌套的 for 循环代码将传递边集并做一些数学运算。

###proprocessing: img ===> countour set

img = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
high_thresh, _ = cv2.threshold(img, 0, 255, cv2.THRESH_BINARY +             
cv2.THRESH_OTSU)
lowThresh = 0.5*high_thresh
b = cv2.Canny(img, lowThresh, high_thresh)
edgeset, _ = 
cv2.findContours(b,cv2.RETR_TREE,cv2.CHAIN_APPROX_NONE)

imgH = img.shape[0]               ## 512
imgW = img.shape[1]               ## 1024
num_edges = len(edgeset)          ## ~900
min_length_segment_vp = imgH/6    ## ~100



### nested for loop

for i in range(num_edges):
    if(edgeset[i].shape[0] > min_length_segment_vp):

        #points: (N, 1, 2) ==> uv: (N, 2)
        uv = edgeset[i].reshape(edgeset[i].shape[0], 
        edgeset[i].shape[2])
        uv = np.unique(uv, axis=0)

        theta = -(uv[:, 1]-imgH/2)*np.pi/imgH
        phi = (uv[:, 0]-imgW/2)*2*np.pi/imgW
        xyz = np.zeros((uv.shape[0], 3))
        xyz[:, 0] = np.sin(phi) * np.cos(theta)
        xyz[:, 1] = np.cos(theta) * np.cos(phi)
        xyz[:, 2] = np.sin(theta)
    
        ##xyz: (N, 3)
        N=xyz.shape[0]
    
        for _ in range(10):
            if(xyz.shape[0] > N * 0.1):
                bestInliers = np.array([])
                bestOutliers = np.array([])

                #### 
                ####  watch this out!
                ####
                for _ in range(1000):
                    id0 = random.randint(0, xyz.shape[0]-1)
                    id1 = random.randint(0, xyz.shape[0]-1)
                    if(id0 == id1):
                        continue

                    n = np.cross(xyz[id0, :], xyz[id1, :])
                    n = n / np.linalg.norm(n)

                    cosTetha = n @ xyz.T
                    inliers = np.abs(cosTetha) < threshold
                    outliers = np.where(np.invert(inliers))[0]
                    inliers = np.where(inliers)[0]
              
                    if inliers.shape[0] > bestInliers.shape[0]:
                        bestInliers = inliers
                        bestOutliers = outliers

我尝试过的：

我将 np.cross 和 np.norm 更改为我的自定义交叉和范数仅适用于形状 (3,) ndarray。这给了我一个从 ~0.9s 到在我的 i5-4460 cpu 中约 0.3 秒。
我分析了我的代码，发现现在最内层循环中的代码仍然花费 2/3 的时间。

我认为我接下来可以尝试：

将代码编译到 cython 中并添加一些 cdef 符号。
将整个文件翻译成 C++。
使用一些更快的库进行计算，例如 numexpr。
循环过程的矢量化（但我不知道如何）。

我可以做得更快吗？请给我一些建议！谢谢！

Answer 1

这个问题很广泛，所以我只会根据我自己的经验给出一些不明显的提示。

如果您使用 Cython，您可能需要将 for 循环更改为 while 循环。我已经成功地获得了相当大（x5）的加速，尽管它可能无法帮助所有可能的情况；
有时在常规 Python 中会被认为效率低下的代码，例如嵌套的 while（或 for）循环将函数一次应用于数组一个元素，可以通过 Cython 进行优化比等效的矢量化 Numpy 方法更快；
找出哪些 Numpy 函数花费的时间最多，并以 Cython 最容易优化它们的方式编写您自己的函数（见上文）。

使用 NumPy 加速嵌套 for 循环

1 个答案: