对于我的神经网络,我希望通过向我的图像添加小的随机旋转和缩放来增强我的训练数据。我遇到的问题是scipy在应用旋转和缩放时会改变图像的大小。如果图像的一部分超出边界,我需要剪切边缘。我的所有图像都必须大小相同。
def loadImageData(img, distort = False):
c, fn = img
img = scipy.ndimage.imread(fn, True)
if distort:
img = scipy.ndimage.zoom(img, 1 + 0.05 * rnd(), mode = 'constant')
img = scipy.ndimage.rotate(img, 10 * rnd(), mode = 'constant')
print(img.shape)
img = img - np.min(img)
img = img / np.max(img)
img = np.reshape(img, (1, *img.shape))
y = np.zeros(ncats)
y[c] = 1
return (img, y)
答案 0 :(得分:29)
scipy.ndimage.rotate
接受reshape=
参数:
重塑:bool,可选
如果
reshape
为真,则调整输出形状以便输入 数组完全包含在输出中。默认为True。
所以去#34;剪辑"你可以简单地调用scipy.ndimage.rotate(img, ..., reshape=False)
。
from scipy.ndimage import rotate
from scipy.misc import face
from matplotlib import pyplot as plt
img = face()
rot = rotate(img, 30, reshape=False)
fig, ax = plt.subplots(1, 2)
ax[0].imshow(img)
ax[1].imshow(rot)
scipy.ndimage.zoom
的事情变得更加复杂。
一个天真的方法是zoom
整个输入数组,然后使用切片索引和/或零填充来使输出与输入的大小相同。但是,如果您要增加图像的大小,那么插入仅会在边缘处被剪掉的像素是浪费的。
相反,在应用zoom
之前,您只能索引将在输出数组范围内的输入部分:
import numpy as np
from scipy.ndimage import zoom
def clipped_zoom(img, zoom_factor, **kwargs):
h, w = img.shape[:2]
# For multichannel images we don't want to apply the zoom factor to the RGB
# dimension, so instead we create a tuple of zoom factors, one per array
# dimension, with 1's for any trailing dimensions after the width and height.
zoom_tuple = (zoom_factor,) * 2 + (1,) * (img.ndim - 2)
# Zooming out
if zoom_factor < 1:
# Bounding box of the zoomed-out image within the output array
zh = int(np.round(h * zoom_factor))
zw = int(np.round(w * zoom_factor))
top = (h - zh) // 2
left = (w - zw) // 2
# Zero-padding
out = np.zeros_like(img)
out[top:top+zh, left:left+zw] = zoom(img, zoom_tuple, **kwargs)
# Zooming in
elif zoom_factor > 1:
# Bounding box of the zoomed-in region within the input array
zh = int(np.round(h / zoom_factor))
zw = int(np.round(w / zoom_factor))
top = (h - zh) // 2
left = (w - zw) // 2
out = zoom(img[top:top+zh, left:left+zw], zoom_tuple, **kwargs)
# `out` might still be slightly larger than `img` due to rounding, so
# trim off any extra pixels at the edges
trim_top = ((out.shape[0] - h) // 2)
trim_left = ((out.shape[1] - w) // 2)
out = out[trim_top:trim_top+h, trim_left:trim_left+w]
# If zoom_factor == 1, just return the input array
else:
out = img
return out
例如:
zm1 = clipped_zoom(img, 0.5)
zm2 = clipped_zoom(img, 1.5)
fig, ax = plt.subplots(1, 3)
ax[0].imshow(img)
ax[1].imshow(zm1)
ax[2].imshow(zm2)
答案 1 :(得分:3)
我建议使用cv2.resize
,因为它比scipy.ndimage.zoom
快,可能是因为支持更简单的插值方法。
对于480x640图像:
cv2.resize
需要~2 ms scipy.ndimage.zoom
需要~500毫秒scipy.ndimage.zoom(...,order=0)
需要~175毫秒如果您正在进行数据增加,那么这个加速比非常宝贵,因为它意味着可以在更短的时间内完成更多的实验。
以下是使用clipped_zoom
cv2.resize
版本
def cv2_clipped_zoom(img, zoom_factor):
"""
Center zoom in/out of the given image and returning an enlarged/shrinked view of
the image without changing dimensions
Args:
img : Image array
zoom_factor : amount of zoom as a ratio (0 to Inf)
"""
height, width = img.shape[:2] # It's also the final desired shape
new_height, new_width = int(height * zoom_factor), int(width * zoom_factor)
### Crop only the part that will remain in the result (more efficient)
# Centered bbox of the final desired size in resized (larger/smaller) image coordinates
y1, x1 = max(0, new_height - height) // 2, max(0, new_width - width) // 2
y2, x2 = y1 + height, x1 + width
bbox = np.array([y1,x1,y2,x2])
# Map back to original image coordinates
bbox = (bbox / zoom_factor).astype(np.int)
y1, x1, y2, x2 = bbox
cropped_img = img[y1:y2, x1:x2]
# Handle padding when downscaling
resize_height, resize_width = min(new_height, height), min(new_width, width)
pad_height1, pad_width1 = (height - resize_height) // 2, (width - resize_width) //2
pad_height2, pad_width2 = (height - resize_height) - pad_height1, (width - resize_width) - pad_width1
pad_spec = [(pad_height1, pad_height2), (pad_width1, pad_width2)] + [(0,0)] * (img.ndim - 2)
result = cv2.resize(cropped_img, (resize_width, resize_height))
result = np.pad(result, pad_spec, mode='constant')
assert result.shape[0] == height and result.shape[1] == width
return result