我是使用Python 3.7.7和Tensorflow 2.1.0的新手,并且试图了解Conv2DTranspose。我已经尝试过此代码:
def vgg16_decoder(input_size = (7, 7, 512)):
inputs = Input(input_size, name = 'input')
conv1 = Conv2DTranspose(512, (2, 2), dilation_rate = 2, name = 'conv1')(inputs)
model = Model(inputs = inputs, outputs = conv1, name = 'vgg-16_decoder')
opt = Adam(lr=0.001)
model.compile(optimizer=opt, loss=keras.losses.categorical_crossentropy, metrics=['accuracy'])
return model
这是其摘要:
Model: "vgg-16_decoder" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input (InputLayer) (None, 7, 7, 512) 0 _________________________________________________________________ conv1 (Conv2DTranspose) (None, 9, 9, 512) 1049088 ================================================================= Total params: 1,049,088 Trainable params: 1,049,088 Non-trainable params: 0 _________________________________________________________________
但是我希望从(None, 14, 14, 512)
输出conv1
。
我将过滤器大小更改为(3, 3)
,并得到了以下摘要:
Model: "vgg-16_decoder" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input (InputLayer) (None, 7, 7, 512) 0 _________________________________________________________________ conv1 (Conv2DTranspose) (None, 11, 11, 512) 2359808 ================================================================= Total params: 2,359,808 Trainable params: 2,359,808 Non-trainable params: 0 _________________________________________________________________
我正在尝试使用Conv2DTranspose
:
# A piece of code from U-NET implementation
up6 = Conv2D(512, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal', name = 'up6')(UpSampling2D(size = (2,2), name = 'upsp1')(drop5))
及其摘要:
drop5 (Dropout) (None, 16, 16, 1024) 0 conv5_2[0][0] __________________________________________________________________________________________________ upsp1 (UpSampling2D) (None, 32, 32, 1024) 0 drop5[0][0] __________________________________________________________________________________________________ up6 (Conv2D) (None, 32, 32, 512) 2097664 upsp1[0][0] __________________________________________________________________________________________________
它对输入进行2倍上采样,并更改其过滤器数量。
如何使用Conv2DTranspose做到这一点?
更新:
我认为,或者我想我做了,但是我不明白自己的所作所为:
conv1 = Conv2DTranspose(512, (2, 2), strides = 2, name = 'conv1')(inputs)
使用前面的语句,我得到了以下摘要:
Model: "vgg-16_decoder" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input (InputLayer) (None, 7, 7, 512) 0 _________________________________________________________________ conv1 (Conv2DTranspose) (None, 14, 14, 512) 1049088 ================================================================= Total params: 1,049,088 Trainable params: 1,049,088 Non-trainable params: 0 _________________________________________________________________
如果您想纠正我或解释我在这里所做的事情,欢迎您。
更新2 :
顺便说一句,我正在尝试创建VGG-16解码器。这是我的VGG-16编码器的代码:
def vgg16_encoder(input_size = (224,224,3)):
inputs = Input(input_size, name = 'input')
conv1 = Conv2D(64, (3, 3), activation = 'relu', padding = 'same', name ='conv1_1')(inputs)
conv1 = Conv2D(64, (3, 3), activation = 'relu', padding = 'same', name ='conv1_2')(conv1)
pool1 = MaxPooling2D(pool_size = (2,2), strides = (2,2), name = 'pool_1')(conv1)
conv2 = Conv2D(128, (3, 3), activation = 'relu', padding = 'same', name ='conv2_1')(pool1)
conv2 = Conv2D(128, (3, 3), activation = 'relu', padding = 'same', name ='conv2_2')(conv2)
pool2 = MaxPooling2D(pool_size = (2,2), strides = (2,2), name = 'pool_2')(conv2)
conv3 = Conv2D(256, (3, 3), activation = 'relu', padding = 'same', name ='conv3_1')(pool2)
conv3 = Conv2D(256, (3, 3), activation = 'relu', padding = 'same', name ='conv3_2')(conv3)
conv3 = Conv2D(256, (3, 3), activation = 'relu', padding = 'same', name ='conv3_3')(conv3)
pool3 = MaxPooling2D(pool_size = (2,2), strides = (2,2), name = 'pool_3')(conv3)
conv4 = Conv2D(512, (3, 3), activation = 'relu', padding = 'same', name ='conv4_1')(pool3)
conv4 = Conv2D(512, (3, 3), activation = 'relu', padding = 'same', name ='conv4_2')(conv4)
conv4 = Conv2D(512, (3, 3), activation = 'relu', padding = 'same', name ='conv4_3')(conv4)
pool4 = MaxPooling2D(pool_size = (2,2), strides = (2,2), name = 'pool_4')(conv4)
conv5 = Conv2D(512, (3, 3), activation = 'relu', padding = 'same', name ='conv5_1')(pool4)
conv5 = Conv2D(512, (3, 3), activation = 'relu', padding = 'same', name ='conv5_2')(conv5)
conv5 = Conv2D(512, (3, 3), activation = 'relu', padding = 'same', name ='conv5_3')(conv5)
pool5 = MaxPooling2D(pool_size = (2,2), strides = (2,2), name = 'pool_5')(conv5)
opt = Adam(lr=0.001)
model = Model(inputs = inputs, outputs = pool5, name = 'vgg-16_encoder')
model.compile(optimizer=opt, loss=keras.losses.categorical_crossentropy, metrics=['accuracy'])
return model
答案 0 :(得分:3)
当我们设计编码器-解码器体系结构时,我们需要一些操作来逆转已经完成的操作。因此,假设在编码器中有Conv2D和Pooling(在VGG等架构中很常见)。我们使用Conv2dTranspose(可以认为是Conv2D的反向操作)和Upsampling2D(是Pooling的反向操作(嗯,不是严格的[因为信息丢失,池化是不可逆的操作]))。
注意:您不想使用Conv2DTranspose对特征图进行升采样(您可以,但是对于VGG,我不认为Conv2DTranspose会以您希望的方式在解码器中提供升采样后的特征图),这不是设计好的这样(它还可以学习升采样,但是可以学习最佳的升采样参数,该参数略有不同)。您最终将拥有非常大的内核,这将导致与您所谈论的VGG编码器完全不同的网络。
from tensorflow.keras.layers import *
from tensorflow.keras.models import *
def encoder_decoder_conv(input_size = (224,224,3)):
ip = Input((224,224,3))
# encoder
conv = Conv2D(512, (3,3))(ip) # look here, the default padding is used
# decoder
inv_conv = Conv2DTranspose(3, (3,3))(conv)
# simple model
model = Model(ip, inv_conv)
return model
model1 = encoder_decoder_conv()
model1.summary()
def encoder_decoder_pooling(input_size = (224,224,3)):
ip = Input((224,224,3))
# encoder
pool = MaxPool2D((2,2))(ip) # look here, the default padding is used
# decoder
inv_pool = UpSampling2D((2,2))(pool)
# simple model
model = Model(ip, inv_pool)
return model
model2 = encoder_decoder_pooling()
model2.summary()
Model: "model_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) [(None, 224, 224, 3)] 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 222, 222, 512) 14336
_________________________________________________________________
conv2d_transpose_1 (Conv2DTr (None, 224, 224, 3) 13827
=================================================================
Total params: 28,163
Trainable params: 28,163
Non-trainable params: 0
_________________________________________________________________
Model: "model_2"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_3 (InputLayer) [(None, 224, 224, 3)] 0
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 112, 112, 3) 0
_________________________________________________________________
up_sampling2d (UpSampling2D) (None, 224, 224, 3) 0
=================================================================
Total params: 0
Trainable params: 0
Non-trainable params: 0
因此,您可以在第一个模型中看到,使用Conv2DTranspose我们反转操作以得到与输入完全相同的形状(224,224,3)。
对于模型2,我们将“合并”的操作(在特征图形状方面)与“向上采样”相反。
因此,当您尝试制作VGG解码器时,而VGG主要由Conv2D和Maxpooling2D组成,您只需使用Conv2dTranspose和Upsampling来反转这些操作,以便获得准确的输入形状(224、224, 3)从要素地图形状(7,7,512)。
最后,解码器部分会有一些变化,但是我认为您正在寻找这款VGG-16解码器。
def vgg16_decoder(input_size = (7,7,512)):
inputs = Input(input_size, name = 'input')
pool5 = UpSampling2D((2,2), name = 'pool_5')(inputs)
conv5 = Conv2DTranspose(512, (3, 3), activation = 'relu', padding = 'same', name ='conv5_3')(pool5)
conv5 = Conv2DTranspose(512, (3, 3), activation = 'relu', padding = 'same', name ='conv5_2')(conv5)
conv5 = Conv2DTranspose(512, (3, 3), activation = 'relu', padding = 'same', name ='conv5_1')(conv5)
pool4 = UpSampling2D((2,2), name = 'pool_4')(conv5)
conv4 = Conv2DTranspose(512, (3, 3), activation = 'relu', padding = 'same', name ='conv4_3')(pool4)
conv4 = Conv2DTranspose(512, (3, 3), activation = 'relu', padding = 'same', name ='conv4_2')(conv4)
conv4 = Conv2DTranspose(512, (3, 3), activation = 'relu', padding = 'same', name ='conv4_1')(conv4)
pool3 = UpSampling2D((2,2), name = 'pool_3')(conv4)
conv3 = Conv2DTranspose(256, (3, 3), activation = 'relu', padding = 'same', name ='conv3_3')(pool3)
conv3 = Conv2DTranspose(256, (3, 3), activation = 'relu', padding = 'same', name ='conv3_2')(conv3)
conv3 = Conv2DTranspose(256, (3, 3), activation = 'relu', padding = 'same', name ='conv3_1')(conv3)
pool2 = UpSampling2D((2,2), name = 'pool_2')(conv3)
conv2 = Conv2DTranspose(128, (3, 3), activation = 'relu', padding = 'same', name ='conv2_2')(pool2)
conv2 = Conv2DTranspose(128, (3, 3), activation = 'relu', padding = 'same', name ='conv2_1')(conv2)
pool1 = UpSampling2D((2,2), name = 'pool_1')(conv2)
conv1 = Conv2DTranspose(64, (3, 3), activation = 'relu', padding = 'same', name ='conv1_2')(pool1)
conv1 = Conv2DTranspose(3, (3, 3), activation = 'relu', padding = 'same', name ='conv1_1')(conv1) # to get 3 channels
model = Model(inputs = inputs, outputs = conv1, name = 'vgg-16_encoder')
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
return model
model = vgg16_decoder()
model.summary()
Model: "vgg-16_encoder"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input (InputLayer) [(None, 7, 7, 512)] 0
_________________________________________________________________
pool_5 (UpSampling2D) (None, 14, 14, 512) 0
_________________________________________________________________
conv5_3 (Conv2DTranspose) (None, 14, 14, 512) 2359808
_________________________________________________________________
conv5_2 (Conv2DTranspose) (None, 14, 14, 512) 2359808
_________________________________________________________________
conv5_1 (Conv2DTranspose) (None, 14, 14, 512) 2359808
_________________________________________________________________
pool_4 (UpSampling2D) (None, 28, 28, 512) 0
_________________________________________________________________
conv4_3 (Conv2DTranspose) (None, 28, 28, 512) 2359808
_________________________________________________________________
conv4_2 (Conv2DTranspose) (None, 28, 28, 512) 2359808
_________________________________________________________________
conv4_1 (Conv2DTranspose) (None, 28, 28, 512) 2359808
_________________________________________________________________
pool_3 (UpSampling2D) (None, 56, 56, 512) 0
_________________________________________________________________
conv3_3 (Conv2DTranspose) (None, 56, 56, 256) 1179904
_________________________________________________________________
conv3_2 (Conv2DTranspose) (None, 56, 56, 256) 590080
_________________________________________________________________
conv3_1 (Conv2DTranspose) (None, 56, 56, 256) 590080
_________________________________________________________________
pool_2 (UpSampling2D) (None, 112, 112, 256) 0
_________________________________________________________________
conv2_2 (Conv2DTranspose) (None, 112, 112, 128) 295040
_________________________________________________________________
conv2_1 (Conv2DTranspose) (None, 112, 112, 128) 147584
_________________________________________________________________
pool_1 (UpSampling2D) (None, 224, 224, 128) 0
_________________________________________________________________
conv1_2 (Conv2DTranspose) (None, 224, 224, 64) 73792
_________________________________________________________________
conv1_1 (Conv2DTranspose) (None, 224, 224, 3) 1731
=================================================================
Total params: 17,037,059
Trainable params: 17,037,059
Non-trainable params: 0
采用(7, 7, 512)
特征形状并重建原始图像尺寸(224, 224, 3)
。
总而言之,设计解码器的机械方式将在进行反向操作时朝相反的方向(相对于编码器)进行。至于Conv2DTranspose和Upsampling2D的详细信息,如果您想更深入地了解这些概念:
https://cs231n.github.io/convolutional-networks/
https://datascience.stackexchange.com/questions/6107/what-are-deconvolutional-layers
https://www.matthewzeiler.com/mattzeiler/deconvolutionalnetworks.pdf
答案 1 :(得分:1)
要获得所需的表格,您需要
conv1 = Conv2DTranspose(512, (8, 8), strides = 1, name = 'conv1')(inputs)
您可能会发现有关转置卷积运算的这篇文章很有用https://towardsdatascience.com/types-of-convolutions-in-deep-learning-717013397f4d
答案 2 :(得分:0)
conv1 = Conv2DTranspose(512, (2, 2), strides = 2, name = 'conv1')(inputs)
之所以有效,是因为您使用的跨度为2。在正常的卷积中,这意味着仅每两步应用一次过滤器(每次跳过一步),这将导致输出的大小减小一半输入的但是,在转置卷积中,情况实际上是相反的,并且跨度为2会使输出大小增加一倍。它是通过在应用卷积之前在输入中基本上插入孔来实现的。
第一个代码段(conv1 = Conv2DTranspose(512, (2, 2), dilation_rate = 2, name = 'conv1')(inputs)
)不起作用,因为您指定的膨胀为2,而不是跨度。这是完全不同的。膨胀会将“孔”插入您的过滤器中,例如看起来像[x1 x2 x3]的过滤器变为[x1 0 x2 0 x3],并且膨胀为2。但是,此带孔过滤器随后会正常应用于输入。
为什么即使使用dilation
,输出大小也会改变?这是由于填充。通常,如果不执行填充操作,则卷积的输出将较小。在转置卷积中,它将变大。您可以使用padding=same
来避免这种情况。
tl; dr:您可以使用Conv2DTranspose(n_filters, filter_size, strides = 2, padding="same")
将图像尺寸加倍。