Question

问：如何加快速度？

下面是我对Matlab的im2col'滑动'的实现，其中包含返回每个第n列的附加功能。该函数采用图像（或任意2个暗淡的数组）并从左到右，从上到下滑动，拾取给定大小的每个重叠子图像，并返回其列为子图像的数组。

import numpy as np

def im2col_sliding(image, block_size, skip=1):

    rows, cols = image.shape
    horz_blocks = cols - block_size[1] + 1
    vert_blocks = rows - block_size[0] + 1

    output_vectors = np.zeros((block_size[0] * block_size[1], horz_blocks * vert_blocks))
    itr = 0
    for v_b in xrange(vert_blocks):
        for h_b in xrange(horz_blocks):
            output_vectors[:, itr] = image[v_b: v_b + block_size[0], h_b: h_b + block_size[1]].ravel()
            itr += 1

    return output_vectors[:, ::skip]

示例：

a = np.arange(16).reshape(4, 4)
print a
print im2col_sliding(a, (2, 2))  # return every overlapping 2x2 patch
print im2col_sliding(a, (2, 2), 4)  # return every 4th vector

返回：

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]
[[  0.   1.   2.   4.   5.   6.   8.   9.  10.]
 [  1.   2.   3.   5.   6.   7.   9.  10.  11.]
 [  4.   5.   6.   8.   9.  10.  12.  13.  14.]
 [  5.   6.   7.   9.  10.  11.  13.  14.  15.]]
[[  0.   5.  10.]
 [  1.   6.  11.]
 [  4.   9.  14.]
 [  5.  10.  15.]]

性能不是很好，特别是考虑到我是调用im2col_sliding(big_matrix, (8, 8))（62001列）还是im2col_sliding(big_matrix, (8, 8), 10)（6201列;仅保留每10个向量），它将花费相同的时间[其中big_matrix的大小为256 x 256]。

我正在寻找任何想法来加快这一步。

Answer 1

方法＃1

我们可以在这里使用一些broadcasting来一次性获取所有滑动窗口的所有索引，因此索引达到vectorized solution。这受到Efficient Implementation of im2col and col2im的启发。

这是实施 -

def im2col_sliding_broadcasting(A, BSZ, stepsize=1):
    # Parameters
    M,N = A.shape
    col_extent = N - BSZ[1] + 1
    row_extent = M - BSZ[0] + 1

    # Get Starting block indices
    start_idx = np.arange(BSZ[0])[:,None]*N + np.arange(BSZ[1])

    # Get offsetted indices across the height and width of input array
    offset_idx = np.arange(row_extent)[:,None]*N + np.arange(col_extent)

    # Get all actual indices & index into input array for final output
    return np.take (A,start_idx.ravel()[:,None] + offset_idx.ravel()[::stepsize])

方法＃2

使用新获得的NumPy array strides知识，让我们创建这样的滑动窗口，我们将有另一个有效的解决方案 -

def im2col_sliding_strided(A, BSZ, stepsize=1):
    # Parameters
    m,n = A.shape
    s0, s1 = A.strides    
    nrows = m-BSZ[0]+1
    ncols = n-BSZ[1]+1
    shp = BSZ[0],BSZ[1],nrows,ncols
    strd = s0,s1,s0,s1

    out_view = np.lib.stride_tricks.as_strided(A, shape=shp, strides=strd)
    return out_view.reshape(BSZ[0]*BSZ[1],-1)[:,::stepsize]

方法＃3

上一种方法中列出的跨步方法已被纳入scikit-image module，以便更简洁，就像这样 -

from skimage.util import view_as_windows as viewW

def im2col_sliding_strided_v2(A, BSZ, stepsize=1):
    return viewW(A, (BSZ[0],BSZ[1])).reshape(-1,BSZ[0]*BSZ[1]).T[:,::stepsize]

样品运行 -

In [106]: a      # Input array
Out[106]: 
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])

In [107]: im2col_sliding_broadcasting(a, (2,3))
Out[107]: 
array([[ 0,  1,  2,  5,  6,  7, 10, 11, 12],
       [ 1,  2,  3,  6,  7,  8, 11, 12, 13],
       [ 2,  3,  4,  7,  8,  9, 12, 13, 14],
       [ 5,  6,  7, 10, 11, 12, 15, 16, 17],
       [ 6,  7,  8, 11, 12, 13, 16, 17, 18],
       [ 7,  8,  9, 12, 13, 14, 17, 18, 19]])

In [108]: im2col_sliding_broadcasting(a, (2,3), stepsize=2)
Out[108]: 
array([[ 0,  2,  6, 10, 12],
       [ 1,  3,  7, 11, 13],
       [ 2,  4,  8, 12, 14],
       [ 5,  7, 11, 15, 17],
       [ 6,  8, 12, 16, 18],
       [ 7,  9, 13, 17, 19]])

运行时测试

In [183]: a = np.random.randint(0,255,(1024,1024))

In [184]: %timeit im2col_sliding(img, (8,8), skip=1)
     ...: %timeit im2col_sliding_broadcasting(img, (8,8), stepsize=1)
     ...: %timeit im2col_sliding_strided(img, (8,8), stepsize=1)
     ...: %timeit im2col_sliding_strided_v2(img, (8,8), stepsize=1)
     ...: 
1 loops, best of 3: 1.29 s per loop
1 loops, best of 3: 226 ms per loop
10 loops, best of 3: 84.5 ms per loop
10 loops, best of 3: 111 ms per loop

In [185]: %timeit im2col_sliding(img, (8,8), skip=4)
     ...: %timeit im2col_sliding_broadcasting(img, (8,8), stepsize=4)
     ...: %timeit im2col_sliding_strided(img, (8,8), stepsize=4)
     ...: %timeit im2col_sliding_strided_v2(img, (8,8), stepsize=4)
     ...: 
1 loops, best of 3: 1.31 s per loop
10 loops, best of 3: 104 ms per loop
10 loops, best of 3: 84.4 ms per loop
10 loops, best of 3: 109 ms per loop

围绕 16x 加速，使用原始循环版本的跨步方法！

Answer 2

对于不同图像频道的滑动窗口，我们可以使用Divakar @ Implement MATLAB's im2col 'sliding' in Python提供的代码的更新版本，即

import numpy as np
A = np.random.randint(0,9,(2,4,4)) # Sample input array
                    # Sample blocksize (rows x columns)
B = [2,2]
skip=[2,2]
# Parameters 
D,M,N = A.shape
col_extent = N - B[1] + 1
row_extent = M - B[0] + 1

# Get Starting block indices
start_idx = np.arange(B[0])[:,None]*N + np.arange(B[1])

# Generate Depth indeces
didx=M*N*np.arange(D)
start_idx=(didx[:,None]+start_idx.ravel()).reshape((-1,B[0],B[1]))

# Get offsetted indices across the height and width of input array
offset_idx = np.arange(row_extent)[:,None]*N + np.arange(col_extent)

# Get all actual indices & index into input array for final output
out = np.take (A,start_idx.ravel()[:,None] + offset_idx[::skip[0],::skip[1]].ravel())

<强>测试样品运行

A=
[[[6 2 8 5]
[6 4 7 6]
[8 6 5 2]
[3 1 3 7]]

[[6 0 4 3]
[7 6 4 6]
[2 6 7 1]
[7 6 7 7]]]

out=
[6 8 8 5]
[2 5 6 2]
[6 7 3 3]
[4 6 1 7]
[6 4 2 7]
[0 3 6 1]
[7 4 7 7]
[6 6 6 7]

Answer 3

我已经使用Numba JIT编译器实现了快速解决方案。根据块大小和跳过大小，它提供从5.67x到3597x的加速范围。

加速指的是numba算法比原始算法快多少倍，例如20x的加速意味着如果原始算法花费了200ms，那么快速的numba算法花费了10ms。

我的代码需要通过python -m pip install numpy numba timerit matplotlib一次安装以下pip模块。

接下来是定位的代码，然后是加速图，然后是控制台的时间测量结果。

Try it online!

import numpy as np

# ----- Original Implementation -----

def im2col_sliding(image, block_size, skip = 1):
    rows, cols = image.shape
    horz_blocks = cols - block_size[1] + 1
    vert_blocks = rows - block_size[0] + 1
    
    if vert_blocks <= 0 or horz_blocks <= 0:
        return np.zeros((block_size[0] * block_size[1], 0), dtype = image.dtype)

    output_vectors = np.zeros((block_size[0] * block_size[1], horz_blocks * vert_blocks), dtype = image.dtype)
    itr = 0
    
    for v_b in range(vert_blocks):
        for h_b in range(horz_blocks):
            output_vectors[:, itr] = image[v_b: v_b + block_size[0], h_b: h_b + block_size[1]].ravel()
            itr += 1

    return output_vectors[:, ::skip]


# ----- Fast Numba Implementation -----
    
import numba

@numba.njit(cache = True)
def im2col_sliding_numba(image, block_size, skip = 1):
    assert skip >= 1
    rows, cols = image.shape
    horz_blocks = cols - block_size[1] + 1
    vert_blocks = rows - block_size[0] + 1
    
    if vert_blocks <= 0 or horz_blocks <= 0:
        return np.zeros((block_size[0] * block_size[1], 0), dtype = image.dtype)
    
    res = np.zeros((block_size[0] * block_size[1], (horz_blocks * vert_blocks + skip - 1) // skip), dtype = image.dtype)
    itr, to_skip, v_b = 0, 0, 0
    
    while True:
        v_b += to_skip // horz_blocks
        if v_b >= vert_blocks:
            break
        h_b_start = to_skip % horz_blocks
        h_cnt = (horz_blocks - h_b_start + skip - 1) // skip
        for i, h_b in zip(range(itr, itr + h_cnt), range(h_b_start, horz_blocks, skip)):
            ii = 0
            for iv in range(v_b, v_b + block_size[0]):
                for ih in range(h_b, h_b + block_size[1]):
                    res[ii, i] = image[iv, ih]
                    ii += 1
        to_skip = skip - (horz_blocks - h_b_start - skip * (h_cnt - 1))
        itr += h_cnt
        v_b += 1
        
    assert itr == res.shape[1]#, (itr, res.shape)

    return res


# ----- Testing -----

from timerit import Timerit
Timerit._default_asciimode = True

side = 256
a = np.random.randint(0, 256, (side, side), dtype = np.uint8)

stats = []

for block_size in [16, 8, 4, 2, 1]:
    for skip_size in [1, 2, 5, 11, 23]:
        print(f'block_size {block_size} skip_size {skip_size}', flush = True)
        for ifn, f in enumerate([im2col_sliding, im2col_sliding_numba]):
            print(f'{f.__name__}: ', end = '', flush = True)
            tim = Timerit(num = 3, verbose = 1)
            for i, t in enumerate(tim):
                if i == 0 and ifn == 1:
                    f(a, (block_size, block_size), skip_size)
                with t:
                    r = f(a, (block_size, block_size), skip_size)
            rt = tim.mean()
            if ifn == 0:
                bt, ba = rt, r
            else:
                assert np.array_equal(ba, r)
                print(f'speedup {round(bt / rt, 2)}x')
                stats.append({
                    'block_size': block_size,
                    'skip_size': skip_size,
                    'speedup': bt / rt,
                })

stats = sorted(stats, key = lambda e: e['speedup'])

import math, matplotlib, matplotlib.pyplot as plt

x = np.arange(len(stats))
y = np.array([e['speedup'] for e in stats])

plt.rcParams['figure.figsize'] = (12.8, 7.2)

for scale in ['linear', 'log']:
    plt.clf()
    plt.xlabel('iteration')
    plt.ylabel(f'speedup_{scale}')
    plt.yscale(scale)
    plt.scatter(x, y, marker = '.')
    for i in range(x.size):
        plt.annotate(
            (f"b{str(stats[i]['block_size']).zfill(2)}s{str(stats[i]['skip_size']).zfill(2)}\n" +
             f"x{round(stats[i]['speedup'], 2 if stats[i]['speedup'] < 100 else 1 if stats[i]['speedup'] < 1000 else None)}"),
            (x[i], y[i]), fontsize = 'small',
        )
    plt.subplots_adjust(left = 0.055, right = 0.99, bottom = 0.08, top = 0.99)
    plt.xlim(left = -0.1)
    if scale == 'linear':
        ymin, ymax = np.amin(y), np.amax(y)
        plt.ylim((ymin - (ymax - ymin) * 0.02, ymax + (ymax - ymin) * 0.05))
        plt.yticks([ymin] + [e for e in plt.yticks()[0] if ymin + 0.01 < e < ymax - 0.01] + [ymax])
        #plt.gca().get_yaxis().set_major_formatter(matplotlib.ticker.FormatStrFormatter('%.1f'))
    plt.savefig(f'im2col_numba_{scale}.png', dpi = 150)
    plt.show()

下一个图的迭代为x轴，加速为y轴，第一个图具有linear y轴，第二个图具有logarithmic {{1 }}轴。每个点还具有标签y，其中bXXsYYxZZ等于块大小，XX等于跳过（步长）大小，YY等于加速。

线性图：

对数图：

控制台输出：

ZZ

Answer 4

为了进一步改善性能（例如卷积），我们还可以使用基于扩展代码的批量实现，由M Elyia @ Implement Matlab's im2col 'sliding' in python提供，即

import numpy as np

A = np.arange(3*1*4*4).reshape(3,1,4,4)+1 # 3 Sample input array with 1 channel
B = [2,2] # Sample blocksize (rows x columns)
skip = [2,2]

# Parameters 
batch, D,M,N = A.shape
col_extent = N - B[1] + 1
row_extent = M - B[0] + 1

# Get batch block indices
batch_idx = np.arange(batch)[:, None, None] * D * M * N

# Get Starting block indices
start_idx = np.arange(B[0])[None, :,None]*N + np.arange(B[1])

# Generate Depth indeces
didx=M*N*np.arange(D)
start_idx=(didx[None, :, None]+start_idx.ravel()).reshape((-1,B[0],B[1]))

# Get offsetted indices across the height and width of input array
offset_idx = np.arange(row_extent)[None, :, None]*N + np.arange(col_extent)

# Get all actual indices & index into input array for final output
act_idx = (batch_idx + 
    start_idx.ravel()[None, :, None] + 
    offset_idx[:,::skip[0],::skip[1]].ravel())

out = np.take (A, act_idx)

测试示例运行：

A = 
[[[[ 1  2  3  4]
   [ 5  6  7  8]
   [ 9 10 11 12]
   [13 14 15 16]]]


 [[[17 18 19 20]
   [21 22 23 24]
   [25 26 27 28]
   [29 30 31 32]]]


 [[[33 34 35 36]
   [37 38 39 40]
   [41 42 43 44]
   [45 46 47 48]]]] 


out = 
[[[ 1  2  3  9 10 11]
  [ 2  3  4 10 11 12]
  [ 5  6  7 13 14 15]
  [ 6  7  8 14 15 16]]

 [[17 18 19 25 26 27]
  [18 19 20 26 27 28]
  [21 22 23 29 30 31]
  [22 23 24 30 31 32]]

 [[33 34 35 41 42 43]
  [34 35 36 42 43 44]
  [37 38 39 45 46 47]
  [38 39 40 46 47 48]]]

Answer 5

我认为你不能做得更好。显然，你必须运行一个大小

的循环

cols - block_size[1] * rows - block_size[0]

但是你的例子中有3个补丁，而不是2个补丁。

Answer 6

您还可以向M Eliya answer添加进一步优化（虽然不是那么重要）

而不是＆＃34;应用＆＃34;跳过最后，你可以在生成偏移数组时应用它，所以代替：

# Get offsetted indices across the height and width of input array
offset_idx = np.arange(row_extent)[:,None]*N + np.arange(col_extent)

# Get all actual indices & index into input array for final output
out = np.take (A,start_idx.ravel()[:,None] + offset_idx[::skip[0],::skip[1]].ravel())

您可以使用numpy的arange函数的步骤参数添加跳过：

# Get offsetted indices across the height and width of input array and add skips
offset_idx = np.arange(row_extent, step=skip[0])[:, None] * N + np.arange(col_extent, step=skip[1])

然后只添加没有[::]索引

的偏移数组

# Get all actual indices & index into input array for final output

out = np.take(A, start_idx.ravel()[:, None] + offset_idx.ravel())

在小跳过值上，它几乎不会节省任何时间：

In[25]:
A = np.random.randint(0,9,(3, 1024, 1024))
B = [2, 2]
skip = [2, 2]

In[26]: %timeit im2col(A, B, skip)
10 loops, best of 3: 19.7 ms per loop

In[27]: %timeit im2col_optimized(A, B, skip)
100 loops, best of 3: 17.5 ms per loop

但是，如果跳过值越大，则会节省更多时间：

In[28]: skip = [10, 10]
In[29]: %timeit im2col(A, B, skip)
100 loops, best of 3: 3.85 ms per loop

In[30]: %timeit im2col_optimized(A, B, skip)
1000 loops, best of 3: 1.02 ms per loop

A = np.random.randint(0,9,(3, 2000, 2000))
B = [10, 10]
skip = [10, 10]

In[43]: %timeit im2col(A, B, skip)
10 loops, best of 3: 87.8 ms per loop

In[44]: %timeit im2col_optimized(A, B, skip)
10 loops, best of 3: 76.3 ms per loop

在Python中实现MATLAB的im2col“滑动”

6 个答案:

运行时测试