Question

我正在使用TensorFlow实现本文https://cs.nyu.edu/~fergus/papers/zeilerECCV2014.pdf中的架构。

我已将输入的格式设置为224, 224, 3，并具有以下TensorFlow层。我遇到的问题是，conv1的输出大小不是纸张上所说的110 x 110 x 96，而是109 x 109 x96。如何解决这个问题？

我遵循了第8页上的论文中指定的超参数。我唯一的想法是填充可能是不正确的（因为TensorFlow为您设置了它）。

我的代码如下：

# Input Layer
# Reshape X to 4-D tensor: [batch_size, width, height, channels]
input_layer = tf.reshape(features["x"], [-1, IMG_SIZE, IMG_SIZE, 3])

print(input_layer.get_shape(), '224, 224, 3')
# Convolutional Layer #1
# Input Tensor Shape: [batch_size, 224, 224, 3]
conv1 = tf.layers.conv2d(
    inputs=input_layer,
    filters=96,
    kernel_size=[7, 7],
    strides=2,
    padding="valid",  # padding = 1
    activation=tf.nn.relu)
print(conv1.get_shape(), '110, 110, 96')

# Max Pooling Layer
# Input Tensor Shape: [batch_size, 110, 110, 96]
pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[3, 3], strides=2)

# Contrast Normalisation
# Input Tensor Shape: [batch_size, 55, 55, 96]
contrast_norm1 = tf.nn.local_response_normalization(
    pool1,
    depth_radius=5,
    bias=1,
    alpha=0.0001,
    beta=0.75)
print(contrast_norm1.get_shape(), '55, 55, 96')

# The rest of the CNN...

输出：括号内-实际尺寸，外面-所需/纸张尺寸

(?, 224, 224, 3) 224, 224, 3  # Input
(?, 109, 109, 96) 110, 110, 96  # Following conv1
(?, 54, 54, 96) 55, 55, 96  # Following contrast_norm1

Answer 1

使用valid填充进行卷积运算的输出高度和宽度尺寸可以计算为：

output_size = (input_size - kernel_size) // stride + 1

在您的情况下，第一层的产出是

output_size = (224 - 7) // 2 + 1 = 217 // 2 + 1 = 109

使第一层输出等于110的一种方法是将内核大小设置为6x6。另一种方法是使用tf.pad添加大小为1的填充：

# suppose this is a batch of 10 images of size 4x4x3
data = np.ones((10, 4, 4, 3), dtype=np.float32)

paddings = [[0, 0], # no values are added along batch dim
            [1, 0], # add one value before the content of height dim
            [1, 0], # add one value before the content of width dim
            [0, 0]] # no values are added along channel dim

padded_data = tf.pad(tensor=data,
                     paddings=paddings,
                     mode='CONSTANT',
                     constant_values=0)

sess = tf. InteractiveSession()
output = sess.run(padded_data)
print(output.shape)
# >>> (10, 5, 5, 3)

# print content of first channel of first image
print(output[0,:,:,0])
# >>> [[0. 0. 0. 0. 0.]
#      [0. 1. 1. 1. 1.]
#      [0. 1. 1. 1. 1.]
#      [0. 1. 1. 1. 1.]
#      [0. 1. 1. 1. 1.]]

在上面的示例中，沿高度和宽度尺寸添加了大小为1的零填充。填充的形状应为[number_of_dimensions, 2]，例如对于输入矩阵的每个维度，您可以指定在张量的内容之前和之后要添加多少个值。

如果将此填充应用于输入数据，将导致形状为batch x 225 x 225 x 3的张量，因此卷积层的输出高度和宽度将为110x110。

使用TensorFlow的CNN架构

1 个答案: