使用Conv2DTranspose输出其输入形状的两倍

时间:2020-05-15 10:18:17

标签: python tensorflow conv-neural-network

我是使用Python 3.7.7和Tensorflow 2.1.0的新手,并且试图了解Conv2DTranspose。我已经尝试过此代码:

def vgg16_decoder(input_size = (7, 7, 512)):
    inputs = Input(input_size, name = 'input')

    conv1 = Conv2DTranspose(512, (2, 2), dilation_rate = 2, name = 'conv1')(inputs)

    model = Model(inputs = inputs, outputs = conv1, name = 'vgg-16_decoder')

    opt = Adam(lr=0.001)
    model.compile(optimizer=opt, loss=keras.losses.categorical_crossentropy, metrics=['accuracy'])

    return model

这是其摘要:

Model: "vgg-16_decoder"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input (InputLayer)           (None, 7, 7, 512)         0
_________________________________________________________________
conv1 (Conv2DTranspose)      (None, 9, 9, 512)         1049088
=================================================================
Total params: 1,049,088
Trainable params: 1,049,088
Non-trainable params: 0
_________________________________________________________________

但是我希望从(None, 14, 14, 512)输出conv1

我将过滤器大小更改为(3, 3),并得到了以下摘要:

Model: "vgg-16_decoder"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input (InputLayer)           (None, 7, 7, 512)         0
_________________________________________________________________
conv1 (Conv2DTranspose)      (None, 11, 11, 512)       2359808
=================================================================
Total params: 2,359,808
Trainable params: 2,359,808
Non-trainable params: 0
_________________________________________________________________

我正在尝试使用Conv2DTranspose

# A piece of code from U-NET implementation

up6 = Conv2D(512, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal', name = 'up6')(UpSampling2D(size = (2,2), name = 'upsp1')(drop5))

及其摘要:

drop5 (Dropout)                 (None, 16, 16, 1024) 0           conv5_2[0][0]
__________________________________________________________________________________________________
upsp1 (UpSampling2D)            (None, 32, 32, 1024) 0           drop5[0][0]
__________________________________________________________________________________________________
up6 (Conv2D)                    (None, 32, 32, 512)  2097664     upsp1[0][0]
__________________________________________________________________________________________________

它对输入进行2倍上采样,并更改其过滤器数量。

如何使用Conv2DTranspose做到这一点?

更新

我认为,或者我想我做了,但是我不明白自己的所作所为:

conv1 = Conv2DTranspose(512, (2, 2), strides = 2, name = 'conv1')(inputs)

使用前面的语句,我得到了以下摘要:

Model: "vgg-16_decoder"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input (InputLayer)           (None, 7, 7, 512)         0
_________________________________________________________________
conv1 (Conv2DTranspose)      (None, 14, 14, 512)       1049088
=================================================================
Total params: 1,049,088
Trainable params: 1,049,088
Non-trainable params: 0
_________________________________________________________________

如果您想纠正我或解释我在这里所做的事情,欢迎您。

更新2

顺便说一句,我正在尝试创建VGG-16解码器。这是我的VGG-16编码器的代码:

def vgg16_encoder(input_size = (224,224,3)):
    inputs = Input(input_size, name = 'input')

    conv1 = Conv2D(64, (3, 3), activation = 'relu', padding = 'same', name ='conv1_1')(inputs)
    conv1 = Conv2D(64, (3, 3), activation = 'relu', padding = 'same', name ='conv1_2')(conv1)
    pool1 = MaxPooling2D(pool_size = (2,2), strides = (2,2), name = 'pool_1')(conv1)

    conv2 = Conv2D(128, (3, 3), activation = 'relu', padding = 'same', name ='conv2_1')(pool1)
    conv2 = Conv2D(128, (3, 3), activation = 'relu', padding = 'same', name ='conv2_2')(conv2)
    pool2 = MaxPooling2D(pool_size = (2,2), strides = (2,2), name = 'pool_2')(conv2)

    conv3 = Conv2D(256, (3, 3), activation = 'relu', padding = 'same', name ='conv3_1')(pool2)
    conv3 = Conv2D(256, (3, 3), activation = 'relu', padding = 'same', name ='conv3_2')(conv3)
    conv3 = Conv2D(256, (3, 3), activation = 'relu', padding = 'same', name ='conv3_3')(conv3)
    pool3 = MaxPooling2D(pool_size = (2,2), strides = (2,2), name = 'pool_3')(conv3)

    conv4 = Conv2D(512, (3, 3), activation = 'relu', padding = 'same', name ='conv4_1')(pool3)
    conv4 = Conv2D(512, (3, 3), activation = 'relu', padding = 'same', name ='conv4_2')(conv4)
    conv4 = Conv2D(512, (3, 3), activation = 'relu', padding = 'same', name ='conv4_3')(conv4)
    pool4 = MaxPooling2D(pool_size = (2,2), strides = (2,2), name = 'pool_4')(conv4)

    conv5 = Conv2D(512, (3, 3), activation = 'relu', padding = 'same', name ='conv5_1')(pool4)
    conv5 = Conv2D(512, (3, 3), activation = 'relu', padding = 'same', name ='conv5_2')(conv5)
    conv5 = Conv2D(512, (3, 3), activation = 'relu', padding = 'same', name ='conv5_3')(conv5)
    pool5 = MaxPooling2D(pool_size = (2,2), strides = (2,2), name = 'pool_5')(conv5)

    opt = Adam(lr=0.001)

    model = Model(inputs = inputs, outputs = pool5, name = 'vgg-16_encoder')

    model.compile(optimizer=opt, loss=keras.losses.categorical_crossentropy, metrics=['accuracy'])

    return model

3 个答案:

答案 0 :(得分:3)

当我们设计编码器-解码器体系结构时,我们需要一些操作来逆转已经完成的操作。因此,假设在编码器中有Conv2D和Pooling(在VGG等架构中很常见)。我们使用Conv2dTranspose(可以认为是Conv2D的反向操作)和Upsampling2D(是Pooling的反向操作(嗯,不是严格的[因为信息丢失,池化是不可逆的操作]))。

注意:您不想使用Conv2DTranspose对特征图进行升采样(您可以,但是对于VGG,我不认为Conv2DTranspose会以您希望的方式在解码器中提供升采样后的特征图),这不是设计好的这样(它还可以学习升采样,但是可以学习最佳的升采样参数,该参数略有不同)。您最终将拥有非常大的内核,这将导致与您所谈论的VGG编码器完全不同的网络。

from tensorflow.keras.layers import *
from tensorflow.keras.models import *

def encoder_decoder_conv(input_size = (224,224,3)):
    ip = Input((224,224,3))
    # encoder
    conv = Conv2D(512, (3,3))(ip) # look here, the default padding is used
    # decoder
    inv_conv = Conv2DTranspose(3, (3,3))(conv)
    # simple model
    model = Model(ip, inv_conv)
    return model

model1 = encoder_decoder_conv()
model1.summary()

def encoder_decoder_pooling(input_size = (224,224,3)):
    ip = Input((224,224,3))
    # encoder
    pool = MaxPool2D((2,2))(ip) # look here, the default padding is used
    # decoder
    inv_pool = UpSampling2D((2,2))(pool)
    # simple model
    model = Model(ip, inv_pool)
    return model

model2 = encoder_decoder_pooling()
model2.summary()
Model: "model_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_2 (InputLayer)         [(None, 224, 224, 3)]     0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 222, 222, 512)     14336     
_________________________________________________________________
conv2d_transpose_1 (Conv2DTr (None, 224, 224, 3)       13827     
=================================================================
Total params: 28,163
Trainable params: 28,163
Non-trainable params: 0
_________________________________________________________________
Model: "model_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_3 (InputLayer)         [(None, 224, 224, 3)]     0         
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 112, 112, 3)       0         
_________________________________________________________________
up_sampling2d (UpSampling2D) (None, 224, 224, 3)       0         
=================================================================
Total params: 0
Trainable params: 0
Non-trainable params: 0

因此,您可以在第一个模型中看到,使用Conv2DTranspose我们反转操作以得到与输入完全相同的形状(224,224,3)。

对于模型2,我们将“合并”的操作(在特征图形状方面)与“向上采样”相反。

因此,当您尝试制作VGG解码器时,而VGG主要由Conv2D和Maxpooling2D组成,您只需使用Conv2dTranspose和Upsampling来反转这些操作,以便获得准确的输入形状(224、224, 3)从要素地图形状(7,7,512)。

最后,解码器部分会有一些变化,但是我认为您正在寻找这款VGG-16解码器。

def vgg16_decoder(input_size = (7,7,512)):
    inputs = Input(input_size, name = 'input')

    pool5 = UpSampling2D((2,2), name = 'pool_5')(inputs)
    conv5 = Conv2DTranspose(512, (3, 3), activation = 'relu', padding = 'same', name ='conv5_3')(pool5)

    conv5 = Conv2DTranspose(512, (3, 3), activation = 'relu', padding = 'same', name ='conv5_2')(conv5)

    conv5 = Conv2DTranspose(512, (3, 3), activation = 'relu', padding = 'same', name ='conv5_1')(conv5)

    pool4 = UpSampling2D((2,2), name = 'pool_4')(conv5)

    conv4 = Conv2DTranspose(512, (3, 3), activation = 'relu', padding = 'same', name ='conv4_3')(pool4)

    conv4 = Conv2DTranspose(512, (3, 3), activation = 'relu', padding = 'same', name ='conv4_2')(conv4)
    conv4 = Conv2DTranspose(512, (3, 3), activation = 'relu', padding = 'same', name ='conv4_1')(conv4)
    pool3 = UpSampling2D((2,2), name = 'pool_3')(conv4)

    conv3 = Conv2DTranspose(256, (3, 3), activation = 'relu', padding = 'same', name ='conv3_3')(pool3)
    conv3 = Conv2DTranspose(256, (3, 3), activation = 'relu', padding = 'same', name ='conv3_2')(conv3)

    conv3 = Conv2DTranspose(256, (3, 3), activation = 'relu', padding = 'same', name ='conv3_1')(conv3)

    pool2 = UpSampling2D((2,2), name = 'pool_2')(conv3)
    conv2 = Conv2DTranspose(128, (3, 3), activation = 'relu', padding = 'same', name ='conv2_2')(pool2)

    conv2 = Conv2DTranspose(128, (3, 3), activation = 'relu', padding = 'same', name ='conv2_1')(conv2)

    pool1 = UpSampling2D((2,2), name = 'pool_1')(conv2)

    conv1 = Conv2DTranspose(64, (3, 3), activation = 'relu', padding = 'same', name ='conv1_2')(pool1)

    conv1 = Conv2DTranspose(3, (3, 3), activation = 'relu', padding = 'same', name ='conv1_1')(conv1) # to get 3 channels

    model = Model(inputs = inputs, outputs = conv1, name = 'vgg-16_encoder')

    model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

    return model

model = vgg16_decoder()
model.summary()
Model: "vgg-16_encoder"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input (InputLayer)           [(None, 7, 7, 512)]       0         
_________________________________________________________________
pool_5 (UpSampling2D)        (None, 14, 14, 512)       0         
_________________________________________________________________
conv5_3 (Conv2DTranspose)    (None, 14, 14, 512)       2359808   
_________________________________________________________________
conv5_2 (Conv2DTranspose)    (None, 14, 14, 512)       2359808   
_________________________________________________________________
conv5_1 (Conv2DTranspose)    (None, 14, 14, 512)       2359808   
_________________________________________________________________
pool_4 (UpSampling2D)        (None, 28, 28, 512)       0         
_________________________________________________________________
conv4_3 (Conv2DTranspose)    (None, 28, 28, 512)       2359808   
_________________________________________________________________
conv4_2 (Conv2DTranspose)    (None, 28, 28, 512)       2359808   
_________________________________________________________________
conv4_1 (Conv2DTranspose)    (None, 28, 28, 512)       2359808   
_________________________________________________________________
pool_3 (UpSampling2D)        (None, 56, 56, 512)       0         
_________________________________________________________________
conv3_3 (Conv2DTranspose)    (None, 56, 56, 256)       1179904   
_________________________________________________________________
conv3_2 (Conv2DTranspose)    (None, 56, 56, 256)       590080    
_________________________________________________________________
conv3_1 (Conv2DTranspose)    (None, 56, 56, 256)       590080    
_________________________________________________________________
pool_2 (UpSampling2D)        (None, 112, 112, 256)     0         
_________________________________________________________________
conv2_2 (Conv2DTranspose)    (None, 112, 112, 128)     295040    
_________________________________________________________________
conv2_1 (Conv2DTranspose)    (None, 112, 112, 128)     147584    
_________________________________________________________________
pool_1 (UpSampling2D)        (None, 224, 224, 128)     0         
_________________________________________________________________
conv1_2 (Conv2DTranspose)    (None, 224, 224, 64)      73792     
_________________________________________________________________
conv1_1 (Conv2DTranspose)    (None, 224, 224, 3)       1731      
=================================================================
Total params: 17,037,059
Trainable params: 17,037,059
Non-trainable params: 0

采用(7, 7, 512)特征形状并重建原始图像尺寸(224, 224, 3)

总而言之,设计解码器的机械方式将在进行反向操作时朝相反的方向(相对于编码器)进行。至于Conv2DTranspose和Upsampling2D的详细信息,如果您想更深入地了解这些概念:

https://cs231n.github.io/convolutional-networks/

https://datascience.stackexchange.com/questions/6107/what-are-deconvolutional-layers

https://www.matthewzeiler.com/mattzeiler/deconvolutionalnetworks.pdf

答案 1 :(得分:1)

要获得所需的表格,您需要

conv1 = Conv2DTranspose(512, (8, 8), strides = 1, name = 'conv1')(inputs)

您可能会发现有关转置卷积运算的这篇文章很有用https://towardsdatascience.com/types-of-convolutions-in-deep-learning-717013397f4d

答案 2 :(得分:0)

conv1 = Conv2DTranspose(512, (2, 2), strides = 2, name = 'conv1')(inputs)之所以有效,是因为您使用的跨度为2。在正常的卷积中,这意味着仅每两步应用一次过滤器(每次跳过一步),这将导致输出的大小减小一半输入的但是,在转置卷积中,情况实际上是相反的,并且跨度为2会使输出大小增加一倍。它是通过在应用卷积之前在输入中基本上插入孔来实现的。

第一个代码段(conv1 = Conv2DTranspose(512, (2, 2), dilation_rate = 2, name = 'conv1')(inputs))不起作用,因为您指定的膨胀为2,而不是跨度。这是完全不同的。膨胀会将“孔”插入您的过滤器中,例如看起来像[x1 x2 x3]的过滤器变为[x1 0 x2 0 x3],并且膨胀为2。但是,此带孔过滤器随后会正常应用于输入。

为什么即使使用dilation,输出大小也会改变?这是由于填充。通常,如果不执行填充操作,则卷积的输出将较小。在转置卷积中,它将变大。您可以使用padding=same来避免这种情况。

tl; dr:您可以使用Conv2DTranspose(n_filters, filter_size, strides = 2, padding="same")将图像尺寸加倍。