滑动窗口 - 如何获取图像上的窗口位置?

时间:2014-12-20 20:53:50

标签: python numpy computer-vision sliding-window

参考python中的这个很棒的滑动窗口实现:https://github.com/keepitsimple/ocrtest/blob/master/sliding_window.py#blob_contributors_box,我的问题是 - 在代码中我可以实际看到图像上当前窗口的位置吗? 或者我怎样才能抓住它的位置?

在第72行和第85行之后,我尝试打印shapenewstrides,但我显然没有到达此处。在norm_shape函数中,我打印出tuple,但输出只是窗口尺寸(如果我理解正确的话)。

但是我不仅需要尺寸,例如宽度和高度,我还需要根据像素坐标或哪些行从提取窗口的图像中精确地知道其中 /图像中的列。

2 个答案:

答案 0 :(得分:3)

如果您尝试使用,可能更容易理解正在发生的事情 flatten=False在图片上创建一个“网格”窗口:

import numpy as np
from scipy.misc import lena
from matplotlib import pyplot as plt

img = lena()
print(img.shape)
# (512, 512)

# make a 64x64 pixel sliding window on img. 
win = sliding_window(img, (64, 64), shiftSize=None, flatten=False)

print(win.shape)
# (8, 8, 64, 64)
# i.e. (img_height / win_height, img_width / win_width, win_height, win_width)

plt.imshow(win[4, 4, ...])
plt.draw()
# grid position [4, 4] contains Lena's eye and nose

要获得相应的像素坐标,您可以执行以下操作:

def get_win_pixel_coords(grid_pos, win_shape, shift_size=None):
    if shift_size is None:
        shift_size = win_shape
    gr, gc = grid_pos
    sr, sc = shift_size
    wr, wc = win_shape
    top, bottom = gr * sr, (gr * sr) + wr
    left, right = gc * sc, (gc * sc) + wc

    return top, bottom, left, right

# check for grid position [3, 4]
t, b, l, r = get_win_pixel_coords((3, 4), (64, 64))

print(np.all(img[t:b, l:r] == win[3, 4, :, :]))
# True

使用flatten=True,64x64像素窗口的8x8网格将被平展为64x64像素窗口的64长矢量。在那种情况下你 可以使用np.unravel_index之类的东西来转换1D矢量索引 进入网格索引的元组,然后使用这些来获取像素坐标 以上:

win = sliding_window(img, (64, 64), flatten=True)

grid_pos = np.unravel_index(12, (8, 8))
t, b, l, r = get_win_pixel_coords(grid_pos, (64, 64))

print(np.all(img[t:b, l:r] == win[12]))
# True

好的,我会尝试解决您在评论中提出的一些问题。

  

我希望窗口的像素位置相对于实际像素尺寸的原始图像。

也许我不够清楚 - 您已经可以使用类似我的get_win_pixel_coords()函数执行此操作,该函数为您提供窗口相对于图像的顶部,底部,左侧和右侧坐标。例如:

win = sliding_window(img, (64, 64), shiftSize=None, flatten=False)

fig, (ax1, ax2) = plt.subplots(1, 2)
ax1.hold(True)
ax1.imshow(win[4, 4])
ax1.plot(8, 9, 'oy')         # position of Lena's eye, relative to this window

t, b, l, r = get_win_pixel_coords((4, 4), (64, 64))

ax2.hold(True)
ax2.imshow(img)
ax2.plot(t + 8, l + 9, 'oy') # position of Lena's eye, relative to whole image

plt.show()

另请注意,我已更新get_win_pixel_coords()以处理shiftSize不是None的情况(即窗口没有完美平铺图像而没有重叠)。

  

所以我猜测在那种情况下,我应该让网格等于原始图像的尺寸,是吗? (而不是使用8x8)。

不,如果窗口平铺图像而没有重叠(即{I}我已假设到目前为止),那么如果你使网格尺寸等于图像的像素尺寸,那么每个窗口都只是包含一个像素!

  

所以在我的情况下,对于宽度为360且高度为240的图像,这意味着我使用这一行:shiftSize=None。另外,12在这一行中引用了什么?

正如我所说,使'网格大小'等于图像尺寸将毫无意义,因为每个窗口只包含一个像素(至少,假设窗口不重叠)。 12将指向平坦的窗口网格的索引,例如:

grid_pos = np.unravel_index(*12*, (240, 360))
  

我每个窗口移动10个像素,第一个滑动窗口从图像上的坐标0x0开始,第二个从10x10等开始,然后我希望程序不仅返回窗口内容而且返回坐标对应于每个窗口,即0,0,然后是10,10等

正如我所说,您已经可以使用x = np.arange(25).reshape(5, 5) # 5x5 grid containing numbers from 0 ... 24 x_flat = x.ravel() # flatten it into a 25-long vector print(x_flat[12]) # the 12th element in the flattened vector # 12 row, col = np.unravel_index(12, (5, 5)) # corresponding row/col index in x print(x[row, col]) # 12 返回的顶部,底部,左侧,右侧坐标获取窗口相对于图像的位置。如果你真的想要,你可以把它包装成一个单独的函数:

get_win_pixel_coords()

如果你想要窗口中每个像素的坐标,相对于图像,你可以使用的另一个技巧是构造包含图像中每个像素的行和列索引的数组,然后将滑动窗口应用于:

def get_pixels_and_coords(win_grid, grid_pos):
    pix = win_grid[grid_pos]
    tblr = get_win_pixel_coords(grid_pos, pix.shape)
    return pix, tblr

# e.g.:
pix, tblr = get_pixels_and_coords(win, (3, 4))

答案 1 :(得分:0)

要更新@ali_m的答案,因为> {7中的scipy.misc.lena()不再可用。这是一个使用RGB图像scipy.misc.face()的示例,对OP中提供的滑动窗口源代码进行了一些修改。

import numpy as np
from scipy.misc import ascent, face
from matplotlib import pyplot as plt
from numpy.lib.stride_tricks import as_strided as ast

def get_win_pixel_coords(grid_pos, win_shape, shift_size=None):
    if shift_size is None:
        shift_size = win_shape
    gr, gc = grid_pos
    sr, sc = shift_size
    wr, wc = win_shape
    top, bottom = gr * sr, (gr * sr) + wr
    left, right = gc * sc, (gc * sc) + wc

    return top, bottom, left, right
def norm_shape(shape):
    '''
    Normalize numpy array shapes so they're always expressed as a tuple,
    even for one-dimensional shapes.
    Parameters
        shape - an int, or a tuple of ints
    Returns
        a shape tuple
    '''
    try:
        i = int(shape)
        return (i,)
    except TypeError:
        # shape was not a number
        pass

    try:
        t = tuple(shape)
        return t
    except TypeError:
        # shape was not iterable
        pass

    raise TypeError('shape must be an int, or a tuple of ints')


def sliding_window(a,ws,ss = None,flatten = True):
    '''
    Return a sliding window over a in any number of dimensions
    '''
    if None is ss:
        # ss was not provided. the windows will not overlap in any direction.
        ss = ws
    ws = norm_shape(ws)
    ss = norm_shape(ss)
    # convert ws, ss, and a.shape to numpy arrays
    ws = np.array(ws)
    ss = np.array(ss)
    shap = np.array(a.shape)
    # ensure that ws, ss, and a.shape all have the same number of dimensions
    ls = [len(shap),len(ws),len(ss)]
    if 1 != len(set(ls)):
        raise ValueError(\
        'a.shape, ws and ss must all have the same length. They were %s' % str(ls))

    # ensure that ws is smaller than a in every dimension
    if np.any(ws > shap):
        raise ValueError(\
        'ws cannot be larger than a in any dimension.\
 a.shape was %s and ws was %s' % (str(a.shape),str(ws)))
    # how many slices will there be in each dimension?
    newshape = norm_shape(((shap - ws) // ss) + 1)
    # the shape of the strided array will be the number of slices in each dimension
    # plus the shape of the window (tuple addition)
    newshape += norm_shape(ws)
    # the strides tuple will be the array's strides multiplied by step size, plus
    # the array's strides (tuple addition)
    newstrides = norm_shape(np.array(a.strides) * ss) + a.strides
    a = ast(a,shape = newshape,strides = newstrides)
    if not flatten:
        return a
    # Collapse strided so that it has one more dimension than the window.  I.e.,
    # the new array is a flat list of slices.
    meat = len(ws) if ws.shape else 0
    firstdim = (np.product(newshape[:-meat]),) if ws.shape else ()
    dim = firstdim + (newshape[-meat:])
    # remove any dimensions with size 1
    #dim = filter(lambda i : i != 1,dim)
    return a.reshape(dim), newshape

将返回变量newshape添加到sliding_window()可以传递flatten=True并且仍然了解由滑动窗口函数创建的网格的性质。在我的应用程序中,需要一个平坦的计算窗口向量,因为这是扩展应用于每个计算窗口的计算的好点。

如果将96x96窗口(即tile x tile)在两个方向上以50%的重叠率应用于形状为(768,1024,3)的图像,则可以填充输入图像以确保创建滑动窗口之前,输入图像可被N个窗口整除,没有余数。

img = face()
nxo,nyo,nzo = img.shape

tile=96 
pad_img = np.vstack((np.hstack((img,np.fliplr(img))),np.flipud(np.hstack((img,np.fliplr(img))))))

pad_img = pad_img[:nxo+(nxo % tile),:nyo+(nyo % tile), :]



win, ind = sliding_window(pad_img, (96, 96,3), (48,48,3))
print(ind)
(15, 21, 1, 96, 96, 3)
print(win.shape)
(315, 96, 96, 3)

计算窗口的网格包含15行21列和315个计算窗口。 grid_pos可以使用计算窗口(即winind[0]ind[1]的扁平化向量中的索引来确定。如果我们对第239个计算窗口感兴趣:

grid_pos = np.unravel_index(239,(ind[0],ind[1]))
print(grid_pos1)
#(11, 8)

然后可以使用以下命令找到原始图像中计算窗口的边界坐标:

t, b, l, r = get_win_pixel_coords(grid_pos, (96, 96), (48,48))
print(np.all(pad_img[t:b, l:r] == win[239]))
#True