CNN:转发一批图像-使用多个内核时卡住了

时间:2019-05-17 12:20:00

标签: python numpy conv-neural-network

首先,我没有使用任何框架,像上周一样,我开始学习卷积层。我了解正向传播的基础知识,甚至没有回过头,但这超出了范围。这是我的困惑:

假设我有4张图片,包含3个大小为4x4的通道:4x3x4x4 我有3个大小为3x3的通道的内核:K * 3x3x3

我试图计算所有4张图像上的卷积,但是我总是迷失在维数上。这是我尝试过的:

import numpy as np


w = np.array(
    [
        # Img: 1, 4x4 image with 3 channels
        [
            [
                [1, 1, 1, 1],
                [1, 1, 1, 1],
                [1, 1, 1, 1],
                [1, 1, 1, 1]
            ],
            [
                [0, 0, 0, 0],
                [0, 0, 0, 0],
                [0, 0, 0, 0],
                [0, 0, 0, 0]
            ],
            [
                [2, 2, 2, 2],
                [2, 2, 2, 2],
                [2, 2, 2, 2],
                [2, 2, 2, 2]
            ]
        ],
        # Img: 2, 4x4 image with 3 channels
        [
            [
                [0, 0, 0, 0],
                [0, 0, 0, 0],
                [0, 0, 0, 0],
                [0, 0, 0, 0]
            ],
            [
                [1, 1, 1, 1],
                [1, 1, 1, 1],
                [1, 1, 1, 1],
                [1, 1, 1, 1]
            ],
            [
                [2, 2, 2, 2],
                [2, 2, 2, 2],
                [2, 2, 2, 2],
                [2, 2, 2, 2]
            ]
        ],
        # Img: 3, 4x4 image with 3 channels
        [
            [
                [2, 2, 2, 2],
                [2, 2, 2, 2],
                [2, 2, 2, 2],
                [2, 2, 2, 2]
            ],
            [
                [0, 0, 0, 0],
                [0, 0, 0, 0],
                [0, 0, 0, 0],
                [0, 0, 0, 0]
            ],
            [
                [1, 1, 1, 1],
                [1, 1, 1, 1],
                [1, 1, 1, 1],
                [1, 1, 1, 1]
            ]
        ],
        # Img: 4, 4x4 image with 3 channels
        [
            [
                [2, 2, 2, 2],
                [2, 2, 2, 2],
                [2, 2, 2, 2],
                [2, 2, 2, 2]
            ],
            [
                [1, 1, 1, 1],
                [1, 1, 1, 1],
                [1, 1, 1, 1],
                [1, 1, 1, 1]
            ],
            [
                [0, 0, 0, 0],
                [0, 0, 0, 0],
                [0, 0, 0, 0],
                [0, 0, 0, 0]
            ]
        ]
    ]
)

f = np.array(
    [
        # Filter: 1, 3x3 filter for 3 channels -> All 1s
        [
            [
                [1, 1, 1],
                [1, 1, 1],
                [1, 1, 1]
            ],
            [
                [1, 1, 1],
                [1, 1, 1],
                [1, 1, 1]
            ],
            [
                [1, 1, 1],
                [1, 1, 1],
                [1, 1, 1]
            ]
        ],
        # Filter: 2, 3x3 filter for 3 channels -> All 2s
        [
            [
                [2, 2, 2],
                [2, 2, 2],
                [2, 2, 2]
            ],
            [
                [2, 2, 2],
                [2, 2, 2],
                [2, 2, 2]
            ],
            [
                [2, 2, 2],
                [2, 2, 2],
                [2, 2, 2]
            ]
        ]
    ]
)

hori_dimension = (w.shape[3] - f.shape[3]) // 1 + 1
vert_dimension = (w.shape[2] - f.shape[2]) // 1 + 1
r = np.zeros(shape=(w.shape[0], f.shape[0], vert_dimension, hori_dimension))
for i in range(vert_dimension):
    for j in range(hori_dimension):
        r[:, :, i, j] += np.sum(w[:, :, i:i+3, j:j+3] * f, axis=(1, 2, 3))
print(r)

这不起作用,当我有N个图像的K个内核时,这部分问题。

但是,如果我只有一个内核,则可以将操作定义为(这可以正常工作):

hori_dimension = (w.shape[3] - f.shape[3]) // 1 + 1
vert_dimension = (w.shape[2] - f.shape[2]) // 1 + 1
r = np.zeros(shape=(w.shape[0], vert_dimension, hori_dimension))
for i in range(vert_dimension):
    for j in range(hori_dimension):
        r[:, i, j] += np.sum(w[:, :, i:i+3, j:j+3] * f, axis=(1, 2, 3))
print(r)

为每个图像提供2x2功能:

[[[27. 27.]
  [27. 27.]]

 [[27. 27.]
  [27. 27.]]

 [[27. 27.]
  [27. 27.]]

 [[27. 27.]
  [27. 27.]]]

这似乎是正确的,我有1个内核,所以每个图像都有1个特征图,产生2个维度的4个特征图。在上面,我期望有另一个方面。每个内核都有4个功能映射,但我无法弄清楚。

@更新:

这似乎很不错:

p = 0
s = 1
number_of_input_images, number_of_image_channels, height_of_image, width_of_image = w.shape
number_of_kernels, number_of_kernel_channels, height_of_kernel, width_of_kernel = f.shape
assert(number_of_image_channels == number_of_kernel_channels)

width_of_features = (width_of_image - width_of_kernel + 2*p) // s + 1
height_of_features = (height_of_image - height_of_kernel + 2*p) // s + 1
feature_maps = np.zeros(shape=(number_of_input_images, number_of_kernels, height_of_features, width_of_features))

for k in range(f.shape[0]):
    for i in range(height_of_features):
        for j in range(width_of_features):
            feature_maps[:, k, i, j] += np.sum(w[:, :, i:i+3, j:j+3] * f[k], axis=(1, 2, 3))

print(feature_maps)

它会产生以下特征图:

[
    # pic1
    [
        # kernel1
        [
            [27. 27.]
            [27. 27.]
        ]
        # kernel2
        [
            [54. 54.]
            [54. 54.]
        ]
    ]
    # pic2
    [
        #kernel1
        [
            [27. 27.]
            [27. 27.]
        ]
        #kernel2
        [
            [54. 54.]
            [54. 54.]
        ]
    ]
    #pic3
    [
        #kernel1
        [
            [27. 27.]
            [27. 27.]
        ]
        #kernel2
        [
            [54. 54.]
            [54. 54.]
        ]
    ]
    #pic4
    [
        #kernel1
        [
            [27. 27.]
            [27. 27.]
        ]
        #kerbel2
        [
            [54. 54.]
            [54. 54.]
        ]
    ]
]

有更好的方法吗?这是正确的吗?在我看来,这很好。拥有一张图片和多个内核,卷积的结果将是来自每个内核的特征图放在“另一个”之后吗?因此,如果具有特征量为N N个维的K个内核,卷积层的输出将变为K N * N。这样,以上内容似乎正确,我猜呢?就像我说的那样,我确实弄错了这N个维度...

@更新:

最后我得到了有效(正向)/完整(反向传播)卷积的以下代码:

def convolve(sources: np.ndarray,
             kernels: np.ndarray,
             mode: str = 'valid',
             padding: typing.Tuple[int] = (0, 0),
             stride: int = 1):
    number_of_input_images, number_of_image_channels, height_of_image, width_of_image = sources.shape
    number_of_kernels, number_of_kernel_channels, height_of_kernel, width_of_kernel = kernels.shape
    assert(number_of_image_channels == number_of_kernel_channels)

    if mode == 'full':
        padding = (height_of_kernel, width_of_kernel)

    if padding:
        sources = np.pad(sources,
                         ((0, 0), (0, 0), (padding[0], padding[0]), (padding[1], padding[1])),
                         mode='constant', constant_values=0)

    kernels = np.rot90(kernels, k=2, axes=(2, 3))

    width_of_features = (width_of_image - width_of_kernel + 2*padding[1]) // stride + 1
    height_of_features = (height_of_image - height_of_kernel + 2*padding[0]) // stride + 1
    feature_maps = np.zeros(shape=(number_of_input_images, number_of_kernels, height_of_features, width_of_features))

    for k in range(f.shape[0]):
        for i in range(height_of_features):
            for j in range(width_of_features):
                feature_maps[:, k, i, j] = np.einsum('ncij,cij', sources[:, :, i:i+3, j:j+3],  kernels[k])

    return feature_maps

任何反馈将不胜感激。我读到,进行卷积时必须旋转内核,因此我旋转了90度两次,还具有使用自定义填充的能力,并且为了获得完整的卷积,我使用内核-1的大小进行填充,以便所有周围的元素为零,并且没有索引错误。

2 个答案:

答案 0 :(得分:1)

让我们一次查看一张图像和一个内核。如果您的图片大小为wxh,内核大小为f*f,并且一次跨度为一个像素,并且如果将iamge填充为p像素,则1个图像与1个内核的卷积将导致图像大小为(w-f+2*p)/s + 1, (h-f+2*p)/s +1)。对于您的情况,w=h=4f=3s=1p=0

  1. 首先,您从图像中提取了一个f*f补丁。由于您有3个频道,每个补丁将有3个频道
  2. 内核将每个通道与补丁中的相应通道相乘(逐元素乘法)
  3. 最后,将所有通道中的所有数字相加得出一个数字。

图片表示

enter image description here

通过跨图像创建多个此类修补程序,并且由于每个修补程序都会在内核中创建一个数字,因此最终会形成一个数字矩阵,用于构成映像的所有修补程序。

这是针对每个图像完成的,因此最终会为每个图像生成一个较小的卷积图像。

代码示例

images = np.ones((2,3,4,4))
kernal = np.ones((3,3,3))
w = 4
f = 3
p = 0
s = 1
r = np.ones((2, 
        int((w-f+2*p)/s +1), int((w-f+2*p)/s +1)))
for i, image in enumerate(images):
    for h in range((4//3)+1):
        for w in range((4//3)+1):            
            x = np.sum(image[:, w:w+3,h:h+3]*kernal)
            r[i,w,h] = x
print (r)

输出:

[[[27. 27.]
  [27. 27.]]

 [[27. 27.]
  [27. 27.]]]

将2个大小为4x4的图像与大小为3x3的内核进行卷积,将为您提供2个大小为2x2的图像(验证(4-3+0/1 +1, 4-3+0/1 +1)

必读资源:CV231n

答案 1 :(得分:0)

最后得到以下代码来处理批处理卷积:

def convolve(input_images, kernels, padding=0, stride=1):
    number_of_images, image_height, image_width, image_channels = input_images.shape
    number_of_kernels, kernel_height, kernel_width, kernel_depth = kernels.shape
    assert (image_channels == kernel_depth)

    input_images = np.pad(input_images, ((0, 0), (padding, padding), (padding, padding), (0, 0)),
                          mode='constant', constant_values=(0,))
    kernels = np.rot90(kernel, k=2, axes=(1, 2))

    fm_height = (image_height - kernel_height + 2*padding) // stride + 1
    fm_width = (image_width - kernel_width + 2*padding) // stride + 1
    feature_maps = np.zeros(shape=(number_of_images, fm_height, fm_width, number_of_kernels))
    for n in range(number_of_images):
        for i in range(fm_height):
            for j in range(fm_width):
                x = input_images[n, i * stride:i * stride + kernel_height, j * stride:j * stride + kernel_width, :]
                feature_maps[n, i, j, :] = np.einsum('ijk,mijk', x, kernels)
    return feature_maps