我会简短。假设我有以下2个尺寸的图像:N(number_of_images)xH(height_of_images)xW(width_of_images)xD(channels)在numpy中定义为:
input_rgb = np.array(
[
[
[[.7], [.6], [.3]],
[[.2], [.0], [.0]],
[[.1], [.2], [.9]]
],
[
[[.7], [.6], [.3]],
[[.2], [.0], [.0]],
[[.1], [.2], [.9]]
]
])
以及以下2个内核大小:M(number_of_kernels)xH(height_of_kernel)xW(width_of_kernel)xD(channels)定义为:
kernel = np.array(
[
[
[[.2], [.1]],
[[.1], [.7]]
],
[
[[.9], [.7]],
[[.1], [.5]]
]
])
我想对以上2个图像和以上2个内核进行卷积。为此,我在numpy中实现了一个非常简单的基于einsum的解决方案,效果很好,直到获得单个图像为止。对于单个图像,我的算法如下:
def convolve_1m(input_image, kernels, padding=0, stride=1):
image_height, image_width, image_channels = input_image.shape
number_of_kernels, kernel_height, kernel_width, kernel_depth = kernels.shape
assert (image_channels == kernel_depth)
input_image = np.pad(input_image, ((padding, padding), (padding, padding), (0, 0)),
mode='constant', constant_values=(0,))
kernels = np.rot90(kernels, k=2, axes=(1, 2))
fm_height = (image_height - kernel_height + 2*padding) // stride + 1
fm_width = (image_width - kernel_width + 2*padding) // stride + 1
feature_maps = np.zeros(shape=(fm_height, fm_width, number_of_kernels))
for i in range(fm_height):
for j in range(fm_width):
x = input_image[i * stride:i * stride + kernel_height, j * stride:j * stride + kernel_width, :]
feature_maps[i, j, :] = np.einsum('ijk,mijk', x, kernels)
return feature_maps
如果我使用:
convolution = np.array([convolve_1m(input_rgb[0], kernel), convolve_1m(input_rgb[1], kernel)])
print(convolution.shape)
print(convolution)
我得到的结果如下:
(2, 2, 2, 2)
[[[[0.57 0.55]
[0.45 0.33]]
[[0.19 0.35]
[0.2 0.95]]]
[[[0.57 0.55]
[0.45 0.33]]
[[0.19 0.35]
[0.2 0.95]]]]
这看起来很完美,至少在我自己的计算在纸面上的情况下。现在,进入有问题的部分。这看起来并不尽如人意,因为我需要在调用者区域中重建一个np.array,以便可以将其传递到下一个卷积层。因此,我尝试使用以下方法来代替它:
def convolve(input_images, kernels, padding=0, stride=1):
number_of_images, image_height, image_width, image_channels = input_images.shape
number_of_kernels, kernel_height, kernel_width, kernel_depth = kernels.shape
assert (image_channels == kernel_depth)
input_images = np.pad(input_images, ((0, 0), (padding, padding), (padding, padding), (0, 0)),
mode='constant', constant_values=(0,))
kernels = np.rot90(kernel, k=2, axes=(1, 2))
fm_height = (image_height - kernel_height + 2*padding) // stride + 1
fm_width = (image_width - kernel_width + 2*padding) // stride + 1
feature_maps = np.zeros(shape=(number_of_images, fm_height, fm_width, number_of_kernels))
for i in range(fm_height):
for j in range(fm_width):
x = input_images[:, i * stride:i * stride + kernel_height, j * stride:j * stride + kernel_width, :]
feature_maps[:, i, j, :] = np.einsum('nijk,mijk', x, kernels)
return feature_maps
convolution = convolve(input_rgb, kernel)
print(convolution.shape)
print(convolution)
尽管结果还可以,但维数有点奇怪:
(2, 2, 2, 2)
[[[[0.57 0.57]
[0.45 0.45]]
[[0.19 0.19]
[0.2 0.2 ]]]
[[[0.55 0.55]
[0.33 0.33]]
[[0.35 0.35]
[0.95 0.95]]]]
有人可以帮我弄清楚如何使用数组切片而不是像上面这样的范围内循环来使上述N-M案例正常工作吗
def convolve(input_images, kernels, padding=0, stride=1):
number_of_images, image_height, image_width, image_channels = input_images.shape
number_of_kernels, kernel_height, kernel_width, kernel_depth = kernels.shape
assert (image_channels == kernel_depth)
input_images = np.pad(input_images, ((0, 0), (padding, padding), (padding, padding), (0, 0)),
mode='constant', constant_values=(0,))
kernels = np.rot90(kernel, k=2, axes=(1, 2))
fm_height = (image_height - kernel_height + 2*padding) // stride + 1
fm_width = (image_width - kernel_width + 2*padding) // stride + 1
feature_maps = np.zeros(shape=(number_of_images, fm_height, fm_width, number_of_kernels))
for n in range(number_of_images):
for i in range(fm_height):
for j in range(fm_width):
x = input_images[n, i * stride:i * stride + kernel_height, j * stride:j * stride + kernel_width, :]
feature_maps[n, i, j, :] = np.einsum('ijk,mijk', x, kernels)
return feature_maps
虽然这可行并且给出正确的结果,但我希望它没有最外层的for-(n)-循环。