Question

我有一个场景的范围图像。我遍历图像并计算检测窗口下的平均深度变化。检测窗口基于当前位置的周围像素的平均深度改变大小。我累积平均变化以产生简单的响应图像。

大部分时间花在for循环中，我的机器上的512x52图像需要大约40 + s。我希望加快一些速度。是否有更有效/更快的方式来遍历图像？是否有更好的pythonic / numpy / scipy方式访问每个像素？或者我应该去学习cython？

编辑：我通过使用scipy.misc.imread（）而不是skimage.io.imread（）将运行时间减少到大约18秒。不确定区别是什么，我会尝试调查。

以下是代码的简化版本：

import matplotlib.pylab as plt
import numpy as np
from skimage.io import imread
from skimage.transform import integral_image, integrate
import time

def intersect(a, b):
    '''Determine the intersection of two rectangles'''
    rect = (0,0,0,0)
    r0 = max(a[0],b[0])
    c0 = max(a[1],b[1])
    r1 = min(a[2],b[2])
    c1 = min(a[3],b[3])
    # Do we have a valid intersection?
    if r1 > r0 and  c1 > c0: 
         rect = (r0,c0,r1,c1)
    return rect

# Setup data
depth_src = imread("test.jpg", as_grey=True)
depth_intg = integral_image(depth_src)   # integrate to find sum depth in region
depth_pts = integral_image(depth_src > 0)  # integrate to find num points which have depth
boundary = (0,0,depth_src.shape[0]-1,depth_src.shape[1]-1) # rectangle to intersect with

# Image to accumulate response
out_img = np.zeros(depth_src.shape)

# Average dimensions of bbox/detection window per unit length of depth
model = (0.602,2.044)  # width, height

start_time = time.time()
for (r,c), junk in np.ndenumerate(depth_src):
    # Find points around current pixel      
    r0, c0, r1, c1 = intersect((r-1, c-1, r+1, c+1), boundary)

    # Calculate average of depth of points around current pixel
    scale =  integrate(depth_intg, r0, c0, r1, c1) * 255 / 9.0 

    # Based on average depth, create the detection window
    r0 = r - (model[0] * scale/2)
    c0 = c - (model[1] * scale/2)
    r1 = r + (model[0] * scale/2)
    c1 = c + (model[1] * scale/2)

    # Used scale optimised detection window to extract features
    r0, c0, r1, c1 = intersect((r0,c0,r1,c1), boundary)
    depth_count = integrate(depth_pts,r0,c0,r1,c1)
    if depth_count:
         depth_sum = integrate(depth_intg,r0,c0,r1,c1)
         avg_change = depth_sum / depth_count
         # Accumulate response
         out_img[r0:r1,c0:c1] += avg_change
print time.time() - start_time, " seconds"

plt.imshow(out_img)
plt.gray()
plt.show()

Answer 1

迈克尔，有趣的问题。似乎你遇到的主要性能问题是图像中的每个像素都有两个在其上计算的integrate（）函数，一个大小为3x3，另一个大小是事先未知的。无论你使用什么numpy函数，以这种方式计算单个积分都是非常低效的;这是一个算法问题，而不是实现问题。考虑大小为N N的图像。您可以仅使用大约4 * N N个操作来计算该图像中任何大小K K的所有积分，而不是（正如人们可能天真地期望的那样）N N K ķ。您这样做的方法是首先计算每行中窗口K上的滑动总和的图像，然后在每列中的结果上滑动总和。更新每个滑动和以移动到下一个像素只需要在当前窗口中添加最新像素并减去前一窗口中最旧的像素，因此无论窗口大小如何，每个像素都需要两次操作。我们必须做两次（对于行和列），因此每个像素有4次操作。

我不确定numpy中是否有滑动窗口总和，但这个答案提出了几种方法，使用步幅技巧：https://stackoverflow.com/a/12713297/1828289。你当然可以通过一个循环遍历列和一个循环遍历行（采用切片来提取行/列）来实现相同的目标。

示例：

# img is a 2D ndarray
# K is the size of sums to calculate using sliding window
row_sums = numpy.zeros_like(img)
for i in range( img.shape[0] ):
    if i > K:
        row_sums[i,:] = row_sums[i-1,:] - img[i-K-1,:] + img[i,:]
    elif i > 1:
        row_sums[i,:] = row_sums[i-1,:] + img[i,:]
    else: # i == 0
        row_sums[i,:] = img[i,:]

col_sums = numpy.zeros_like(img)
for j in range( img.shape[1] ):
    if j > K:
        col_sums[:,j] = col_sums[:,j-1] - row_sums[:,j-K-1] + row_sums[:,j]
    elif j > 1:
        col_sums[:,j] = col_sums[:,j-1] + row_sums[:,j]
    else: # j == 0
        col_sums[:,j] = row_sums[:,j]

# here col_sums[i,j] should be equal to numpy.sum(img[i-K:i, j-K:j]) if i >=K and j >= K
# first K rows and columns in col_sums contain partial sums and can be ignored

您如何最好地将其应用于您的案件？我想你可能想要预先计算3x3（平均深度）和几个更大尺寸的积分，并使用3x3的值来选择检测窗口中较大尺寸的一个（假设我理解你的意图）算法）。您需要的较大尺寸范围可能会受到限制，或者人为限制它可能仍然可以很好地工作，只需选择最接近的尺寸。使用滑动总和计算所有积分是非常有效的，我几乎可以肯定，对于你在特定像素上永远不会使用的大量尺寸来计算它们是值得的，特别是如果某些尺寸很大的话。

P.S。这是一个小的补充，但你可能想避免为每个像素调用intersect（）：或者（a）只处理距离边缘比最大整数大小更远的像素，或者（b）为图像添加边距所有边上的最大积分大小，用零或nans填充边距，或（c）（最佳方法）使用切片自动处理：ndarray边界外的切片索引自动限制在边界，除了当然还有负面指数。

编辑：添加了滑动窗口总和的示例

numpy图像中像素+邻域的高效处理

1 个答案: