滑动窗口 - 如何获取图像上的窗口位置?

时间:2014-12-20 20:53:50

标签: python numpy computer-vision sliding-window

参考python中的这个很棒的滑动窗口实现:https://github.com/keepitsimple/ocrtest/blob/master/sliding_window.py#blob_contributors_box,我的问题是 - 在代码中我可以实际看到图像上当前窗口的位置吗? 或者我怎样才能抓住它的位置?


但是我不仅需要尺寸,例如宽度和高度,我还需要根据像素坐标或哪些行从提取窗口的图像中精确地知道其中 /图像中的列。

2 个答案:

答案 0 :(得分:3)

如果您尝试使用,可能更容易理解正在发生的事情 flatten=False在图片上创建一个“网格”窗口:

import numpy as np
from scipy.misc import lena
from matplotlib import pyplot as plt

img = lena()
# (512, 512)

# make a 64x64 pixel sliding window on img. 
win = sliding_window(img, (64, 64), shiftSize=None, flatten=False)

# (8, 8, 64, 64)
# i.e. (img_height / win_height, img_width / win_width, win_height, win_width)

plt.imshow(win[4, 4, ...])
# grid position [4, 4] contains Lena's eye and nose


def get_win_pixel_coords(grid_pos, win_shape, shift_size=None):
    if shift_size is None:
        shift_size = win_shape
    gr, gc = grid_pos
    sr, sc = shift_size
    wr, wc = win_shape
    top, bottom = gr * sr, (gr * sr) + wr
    left, right = gc * sc, (gc * sc) + wc

    return top, bottom, left, right

# check for grid position [3, 4]
t, b, l, r = get_win_pixel_coords((3, 4), (64, 64))

print(np.all(img[t:b, l:r] == win[3, 4, :, :]))
# True

使用flatten=True,64x64像素窗口的8x8网格将被平展为64x64像素窗口的64长矢量。在那种情况下你 可以使用np.unravel_index之类的东西来转换1D矢量索引 进入网格索引的元组,然后使用这些来获取像素坐标 以上:

win = sliding_window(img, (64, 64), flatten=True)

grid_pos = np.unravel_index(12, (8, 8))
t, b, l, r = get_win_pixel_coords(grid_pos, (64, 64))

print(np.all(img[t:b, l:r] == win[12]))
# True




也许我不够清楚 - 您已经可以使用类似我的get_win_pixel_coords()函数执行此操作,该函数为您提供窗口相对于图像的顶部,底部,左侧和右侧坐标。例如:

win = sliding_window(img, (64, 64), shiftSize=None, flatten=False)

fig, (ax1, ax2) = plt.subplots(1, 2)
ax1.imshow(win[4, 4])
ax1.plot(8, 9, 'oy')         # position of Lena's eye, relative to this window

t, b, l, r = get_win_pixel_coords((4, 4), (64, 64))

ax2.plot(t + 8, l + 9, 'oy') # position of Lena's eye, relative to whole image




所以我猜测在那种情况下,我应该让网格等于原始图像的尺寸,是吗? (而不是使用8x8)。




正如我所说,使'网格大小'等于图像尺寸将毫无意义,因为每个窗口只包含一个像素(至少,假设窗口不重叠)。 12将指向平坦的窗口网格的索引,例如:

grid_pos = np.unravel_index(*12*, (240, 360))


正如我所说,您已经可以使用x = np.arange(25).reshape(5, 5) # 5x5 grid containing numbers from 0 ... 24 x_flat = x.ravel() # flatten it into a 25-long vector print(x_flat[12]) # the 12th element in the flattened vector # 12 row, col = np.unravel_index(12, (5, 5)) # corresponding row/col index in x print(x[row, col]) # 12 返回的顶部,底部,左侧,右侧坐标获取窗口相对于图像的位置。如果你真的想要,你可以把它包装成一个单独的函数:



def get_pixels_and_coords(win_grid, grid_pos):
    pix = win_grid[grid_pos]
    tblr = get_win_pixel_coords(grid_pos, pix.shape)
    return pix, tblr

# e.g.:
pix, tblr = get_pixels_and_coords(win, (3, 4))

答案 1 :(得分:0)

要更新@ali_m的答案,因为> {7中的scipy.misc.lena()不再可用。这是一个使用RGB图像scipy.misc.face()的示例,对OP中提供的滑动窗口源代码进行了一些修改。

import numpy as np
from scipy.misc import ascent, face
from matplotlib import pyplot as plt
from numpy.lib.stride_tricks import as_strided as ast

def get_win_pixel_coords(grid_pos, win_shape, shift_size=None):
    if shift_size is None:
        shift_size = win_shape
    gr, gc = grid_pos
    sr, sc = shift_size
    wr, wc = win_shape
    top, bottom = gr * sr, (gr * sr) + wr
    left, right = gc * sc, (gc * sc) + wc

    return top, bottom, left, right
def norm_shape(shape):
    Normalize numpy array shapes so they're always expressed as a tuple,
    even for one-dimensional shapes.
        shape - an int, or a tuple of ints
        a shape tuple
        i = int(shape)
        return (i,)
    except TypeError:
        # shape was not a number

        t = tuple(shape)
        return t
    except TypeError:
        # shape was not iterable

    raise TypeError('shape must be an int, or a tuple of ints')

def sliding_window(a,ws,ss = None,flatten = True):
    Return a sliding window over a in any number of dimensions
    if None is ss:
        # ss was not provided. the windows will not overlap in any direction.
        ss = ws
    ws = norm_shape(ws)
    ss = norm_shape(ss)
    # convert ws, ss, and a.shape to numpy arrays
    ws = np.array(ws)
    ss = np.array(ss)
    shap = np.array(a.shape)
    # ensure that ws, ss, and a.shape all have the same number of dimensions
    ls = [len(shap),len(ws),len(ss)]
    if 1 != len(set(ls)):
        raise ValueError(\
        'a.shape, ws and ss must all have the same length. They were %s' % str(ls))

    # ensure that ws is smaller than a in every dimension
    if np.any(ws > shap):
        raise ValueError(\
        'ws cannot be larger than a in any dimension.\
 a.shape was %s and ws was %s' % (str(a.shape),str(ws)))
    # how many slices will there be in each dimension?
    newshape = norm_shape(((shap - ws) // ss) + 1)
    # the shape of the strided array will be the number of slices in each dimension
    # plus the shape of the window (tuple addition)
    newshape += norm_shape(ws)
    # the strides tuple will be the array's strides multiplied by step size, plus
    # the array's strides (tuple addition)
    newstrides = norm_shape(np.array(a.strides) * ss) + a.strides
    a = ast(a,shape = newshape,strides = newstrides)
    if not flatten:
        return a
    # Collapse strided so that it has one more dimension than the window.  I.e.,
    # the new array is a flat list of slices.
    meat = len(ws) if ws.shape else 0
    firstdim = (np.product(newshape[:-meat]),) if ws.shape else ()
    dim = firstdim + (newshape[-meat:])
    # remove any dimensions with size 1
    #dim = filter(lambda i : i != 1,dim)
    return a.reshape(dim), newshape


如果将96x96窗口(即tile x tile)在两个方向上以50%的重叠率应用于形状为(768,1024,3)的图像,则可以填充输入图像以确保创建滑动窗口之前,输入图像可被N个窗口整除,没有余数。

img = face()
nxo,nyo,nzo = img.shape

pad_img = np.vstack((np.hstack((img,np.fliplr(img))),np.flipud(np.hstack((img,np.fliplr(img))))))

pad_img = pad_img[:nxo+(nxo % tile),:nyo+(nyo % tile), :]

win, ind = sliding_window(pad_img, (96, 96,3), (48,48,3))
(15, 21, 1, 96, 96, 3)
(315, 96, 96, 3)

计算窗口的网格包含15行21列和315个计算窗口。 grid_pos可以使用计算窗口(即winind[0]ind[1]的扁平化向量中的索引来确定。如果我们对第239个计算窗口感兴趣:

grid_pos = np.unravel_index(239,(ind[0],ind[1]))
#(11, 8)


t, b, l, r = get_win_pixel_coords(grid_pos, (96, 96), (48,48))
print(np.all(pad_img[t:b, l:r] == win[239]))