ResNet的Stage-1中的BatchNorm图层连接到所有其他BatchNorm图层。为什么?

时间:2018-06-12 08:58:26

标签: keras deep-learning batch-normalization resnet

这里我给出了我实施的ResNet模型的一些截图。使用TensorBoard生成的图表。

tensorflow在后端是否进行了某种优化?

我使用Keras实现了代码。

模型中有两个块。 IdentityBlock和ConvolutionalBlock。 添加这些块的代码会导致StackOverflow出现问题(您的帖子主要是代码)

在ResNet函数(def ResNet)中,我使用了BatchNormalization并给它命名为'bnl_stg-1',我只传递了一个输​​入(X)。但由于某种原因,它连接到身份和卷积块中的所有BatchNorm图层,如图所示。

以下是代码:

def ResNet(input_shape, features):

'''
Implements the ResNet50 Model
[Conv2D -> BatchNorm -> ReLU -> MaxPool2D] --> [ConvBlock -> IdentityBlock * 2] --> [ConvBlock -> IdentityBlock * 3] --> [AveragePool2D -> Flatten -> Dense -> Sigmoid]
'''   
X_input = Input(input_shape)

X = ZeroPadding2D((3, 3))(X_input)

# Stage 1
X = Conv2D(filters = 64, 
           kernel_size = (7, 7), 
           strides = (2, 2), 
           name = 'cnl_stg-1', 
           kernel_initializer = 'glorot_uniform')(X)

X = BatchNormalization(axis = 3, 
                       name = 'bnl_stg-1')(X)

X = Activation('relu')(X)

X = MaxPooling2D(pool_size=(3, 3), 
                 strides=(2, 2))(X)

# Stage 2
X = convolutional_block(X, f = 3, filters = [64, 64, 256], stage = 2, s = 1)
X = identity_block(X, 3, [64, 64, 256], stage=2, block=1)
X = identity_block(X, 3, [64, 64, 256], stage=2, block=2)

# Stage 3 
X = convolutional_block(X, f = 3, filters = [128, 128, 512], stage = 3, s = 2)
X = identity_block(X, 3, [128, 128, 512], stage = 3, block = 1)
X = identity_block(X, 3, [128, 128, 512], stage = 3, block = 2)
X = identity_block(X, 3, [128, 128, 512], stage = 3, block = 3)

#Final Stage
X = AveragePooling2D(pool_size = (2, 2),
                     strides = (2, 2))(X)
X = Flatten()(X)
X = Dense(features, activation='sigmoid', name='fc' + str(features), kernel_initializer = 'glorot_uniform')(X)

# Create model
model = Model(inputs = X_input, outputs = X, name='ResNet')

return model

Snapshot of the Graph

bnl-stg-1(BatchNormLayer-Stage-1)

1 个答案:

答案 0 :(得分:0)

你不应该担心它。批量标准化行为在训练和学习之间发生变化,因此Keras添加了一个布尔变量来控制它(keras_learning_phase如果我记得很清楚)。这就是所有这些层都连接起来的原因。 Dropout图层可能会出现类似的行为。