首先,我没有使用任何框架,像上周一样,我开始学习卷积层。我了解正向传播的基础知识,甚至没有回过头,但这超出了范围。这是我的困惑:
假设我有4张图片,包含3个大小为4x4的通道:4x3x4x4 我有3个大小为3x3的通道的内核:K * 3x3x3
我试图计算所有4张图像上的卷积,但是我总是迷失在维数上。这是我尝试过的:
import numpy as np
w = np.array(
[
# Img: 1, 4x4 image with 3 channels
[
[
[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1]
],
[
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]
],
[
[2, 2, 2, 2],
[2, 2, 2, 2],
[2, 2, 2, 2],
[2, 2, 2, 2]
]
],
# Img: 2, 4x4 image with 3 channels
[
[
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]
],
[
[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1]
],
[
[2, 2, 2, 2],
[2, 2, 2, 2],
[2, 2, 2, 2],
[2, 2, 2, 2]
]
],
# Img: 3, 4x4 image with 3 channels
[
[
[2, 2, 2, 2],
[2, 2, 2, 2],
[2, 2, 2, 2],
[2, 2, 2, 2]
],
[
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]
],
[
[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1]
]
],
# Img: 4, 4x4 image with 3 channels
[
[
[2, 2, 2, 2],
[2, 2, 2, 2],
[2, 2, 2, 2],
[2, 2, 2, 2]
],
[
[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1]
],
[
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]
]
]
]
)
f = np.array(
[
# Filter: 1, 3x3 filter for 3 channels -> All 1s
[
[
[1, 1, 1],
[1, 1, 1],
[1, 1, 1]
],
[
[1, 1, 1],
[1, 1, 1],
[1, 1, 1]
],
[
[1, 1, 1],
[1, 1, 1],
[1, 1, 1]
]
],
# Filter: 2, 3x3 filter for 3 channels -> All 2s
[
[
[2, 2, 2],
[2, 2, 2],
[2, 2, 2]
],
[
[2, 2, 2],
[2, 2, 2],
[2, 2, 2]
],
[
[2, 2, 2],
[2, 2, 2],
[2, 2, 2]
]
]
]
)
hori_dimension = (w.shape[3] - f.shape[3]) // 1 + 1
vert_dimension = (w.shape[2] - f.shape[2]) // 1 + 1
r = np.zeros(shape=(w.shape[0], f.shape[0], vert_dimension, hori_dimension))
for i in range(vert_dimension):
for j in range(hori_dimension):
r[:, :, i, j] += np.sum(w[:, :, i:i+3, j:j+3] * f, axis=(1, 2, 3))
print(r)
这不起作用,当我有N个图像的K个内核时,这部分问题。
但是,如果我只有一个内核,则可以将操作定义为(这可以正常工作):
hori_dimension = (w.shape[3] - f.shape[3]) // 1 + 1
vert_dimension = (w.shape[2] - f.shape[2]) // 1 + 1
r = np.zeros(shape=(w.shape[0], vert_dimension, hori_dimension))
for i in range(vert_dimension):
for j in range(hori_dimension):
r[:, i, j] += np.sum(w[:, :, i:i+3, j:j+3] * f, axis=(1, 2, 3))
print(r)
为每个图像提供2x2功能:
[[[27. 27.]
[27. 27.]]
[[27. 27.]
[27. 27.]]
[[27. 27.]
[27. 27.]]
[[27. 27.]
[27. 27.]]]
这似乎是正确的,我有1个内核,所以每个图像都有1个特征图,产生2个维度的4个特征图。在上面,我期望有另一个方面。每个内核都有4个功能映射,但我无法弄清楚。
@更新:
这似乎很不错:
p = 0
s = 1
number_of_input_images, number_of_image_channels, height_of_image, width_of_image = w.shape
number_of_kernels, number_of_kernel_channels, height_of_kernel, width_of_kernel = f.shape
assert(number_of_image_channels == number_of_kernel_channels)
width_of_features = (width_of_image - width_of_kernel + 2*p) // s + 1
height_of_features = (height_of_image - height_of_kernel + 2*p) // s + 1
feature_maps = np.zeros(shape=(number_of_input_images, number_of_kernels, height_of_features, width_of_features))
for k in range(f.shape[0]):
for i in range(height_of_features):
for j in range(width_of_features):
feature_maps[:, k, i, j] += np.sum(w[:, :, i:i+3, j:j+3] * f[k], axis=(1, 2, 3))
print(feature_maps)
它会产生以下特征图:
[
# pic1
[
# kernel1
[
[27. 27.]
[27. 27.]
]
# kernel2
[
[54. 54.]
[54. 54.]
]
]
# pic2
[
#kernel1
[
[27. 27.]
[27. 27.]
]
#kernel2
[
[54. 54.]
[54. 54.]
]
]
#pic3
[
#kernel1
[
[27. 27.]
[27. 27.]
]
#kernel2
[
[54. 54.]
[54. 54.]
]
]
#pic4
[
#kernel1
[
[27. 27.]
[27. 27.]
]
#kerbel2
[
[54. 54.]
[54. 54.]
]
]
]
有更好的方法吗?这是正确的吗?在我看来,这很好。拥有一张图片和多个内核,卷积的结果将是来自每个内核的特征图放在“另一个”之后吗?因此,如果具有特征量为N N个维的K个内核,卷积层的输出将变为K N * N。这样,以上内容似乎正确,我猜呢?就像我说的那样,我确实弄错了这N个维度...
@更新:
最后我得到了有效(正向)/完整(反向传播)卷积的以下代码:
def convolve(sources: np.ndarray,
kernels: np.ndarray,
mode: str = 'valid',
padding: typing.Tuple[int] = (0, 0),
stride: int = 1):
number_of_input_images, number_of_image_channels, height_of_image, width_of_image = sources.shape
number_of_kernels, number_of_kernel_channels, height_of_kernel, width_of_kernel = kernels.shape
assert(number_of_image_channels == number_of_kernel_channels)
if mode == 'full':
padding = (height_of_kernel, width_of_kernel)
if padding:
sources = np.pad(sources,
((0, 0), (0, 0), (padding[0], padding[0]), (padding[1], padding[1])),
mode='constant', constant_values=0)
kernels = np.rot90(kernels, k=2, axes=(2, 3))
width_of_features = (width_of_image - width_of_kernel + 2*padding[1]) // stride + 1
height_of_features = (height_of_image - height_of_kernel + 2*padding[0]) // stride + 1
feature_maps = np.zeros(shape=(number_of_input_images, number_of_kernels, height_of_features, width_of_features))
for k in range(f.shape[0]):
for i in range(height_of_features):
for j in range(width_of_features):
feature_maps[:, k, i, j] = np.einsum('ncij,cij', sources[:, :, i:i+3, j:j+3], kernels[k])
return feature_maps
任何反馈将不胜感激。我读到,进行卷积时必须旋转内核,因此我旋转了90度两次,还具有使用自定义填充的能力,并且为了获得完整的卷积,我使用内核-1的大小进行填充,以便所有周围的元素为零,并且没有索引错误。
答案 0 :(得分:1)
让我们一次查看一张图像和一个内核。如果您的图片大小为wxh
,内核大小为f*f
,并且一次跨度为一个像素,并且如果将iamge填充为p
像素,则1个图像与1个内核的卷积将导致图像大小为(w-f+2*p)/s + 1, (h-f+2*p)/s +1)
。对于您的情况,w=h=4
,f=3
,s=1
和p=0
。
f*f
补丁。由于您有3个频道,每个补丁将有3个频道通过跨图像创建多个此类修补程序,并且由于每个修补程序都会在内核中创建一个数字,因此最终会形成一个数字矩阵,用于构成映像的所有修补程序。
这是针对每个图像完成的,因此最终会为每个图像生成一个较小的卷积图像。
images = np.ones((2,3,4,4))
kernal = np.ones((3,3,3))
w = 4
f = 3
p = 0
s = 1
r = np.ones((2,
int((w-f+2*p)/s +1), int((w-f+2*p)/s +1)))
for i, image in enumerate(images):
for h in range((4//3)+1):
for w in range((4//3)+1):
x = np.sum(image[:, w:w+3,h:h+3]*kernal)
r[i,w,h] = x
print (r)
输出:
[[[27. 27.]
[27. 27.]]
[[27. 27.]
[27. 27.]]]
将2个大小为4x4
的图像与大小为3x3
的内核进行卷积,将为您提供2个大小为2x2
的图像(验证(4-3+0/1 +1, 4-3+0/1 +1)
)
必读资源:CV231n
答案 1 :(得分:0)
最后得到以下代码来处理批处理卷积:
def convolve(input_images, kernels, padding=0, stride=1):
number_of_images, image_height, image_width, image_channels = input_images.shape
number_of_kernels, kernel_height, kernel_width, kernel_depth = kernels.shape
assert (image_channels == kernel_depth)
input_images = np.pad(input_images, ((0, 0), (padding, padding), (padding, padding), (0, 0)),
mode='constant', constant_values=(0,))
kernels = np.rot90(kernel, k=2, axes=(1, 2))
fm_height = (image_height - kernel_height + 2*padding) // stride + 1
fm_width = (image_width - kernel_width + 2*padding) // stride + 1
feature_maps = np.zeros(shape=(number_of_images, fm_height, fm_width, number_of_kernels))
for n in range(number_of_images):
for i in range(fm_height):
for j in range(fm_width):
x = input_images[n, i * stride:i * stride + kernel_height, j * stride:j * stride + kernel_width, :]
feature_maps[n, i, j, :] = np.einsum('ijk,mijk', x, kernels)
return feature_maps