我试图了解Keras自定义图层的工作原理,但是我面临模型验证准确性的问题。
我试图在MNIST数据集上重现一个简单的卷积网络,但要使用结合了Conv2D运算符和BatchNormalisation的自定义层。
首先,我使用的数据:
from keras.datasets import mnist
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = np.array([x.reshape(28, 28, 1) for x in X_train])
X_test = np.array([x.reshape(28, 28, 1) for x in X_test])
y_train = pd.get_dummies(y_train)
y_test = pd.get_dummies(y_test)
这是效果很好的原始实现:
def get_model():
input_ = Input(shape=(28, 28, 1))
x = Conv2D(filters=64, kernel_size=3, activation="relu", input_shape=(28,28,1))(input_)
x = BatchNormalization()(x)
x = MaxPool2D(pool_size=(2,2))(x)
x = Conv2D(filters=128, kernel_size=3, activation="relu")(input_)
x = BatchNormalization()(x)
x = MaxPool2D(pool_size=(2,2))(x)
x = Conv2D(filters=256, kernel_size=3, activation="relu")(input_)
x = BatchNormalization()(x)
x = MaxPool2D(pool_size=(2,2))(x)
x = Flatten()(x)
x = Dense(128, activation="relu")(x)
x = Dense(64, activation="relu")(x)
x = Dense(10, activation="softmax")(x)
mod = Model(inputs=input_, outputs=x)
return mod
optim = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, clipvalue=K.epsilon())
model = get_model()
model.compile(optimizer=optim, loss='categorical_crossentropy', metrics=["accuracy"])
model.fit(X_train, y_train, batch_size=128, epochs=3, validation_data=(X_test, y_test))
在这个初始模型中,经过3个时间段,我的火车准确性达到了97%,验证率为97%
这是我的自定义图层:
class Conv2DLayer(Layer):
def __init__(self, filters, kernel_size, dropout_ratio=None, strides=(1, 1), activation="relu", use_bn=True, *args, **kwargs):
self._filters = filters
self._kernel_size = kernel_size
self._dropout_ratio = dropout_ratio
self._strides = strides
self.use_bn = use_bn
self._activation = activation
self._args = args
self._kwargs = kwargs
super(Conv2DLayer, self).__init__(*args, **kwargs)
def build(self, input_shape):
self.conv = Conv2D(self._filters,
kernel_size=self._kernel_size,
activation=self._activation,
strides=self._strides,
input_shape=input_shape,
*self._args,
**self._kwargs)
self.conv.build(input_shape)
self.out_conv_shape = self.conv.compute_output_shape(input_shape)
self._trainable_weights = self.conv._trainable_weights
self._non_trainable_weights = self.conv._non_trainable_weights
if self.use_bn:
self.bn = BatchNormalization()
self.bn.build(self.out_conv_shape)
self._trainable_weights.extend(self.bn._trainable_weights)
self._non_trainable_weights.extend(self.bn._non_trainable_weights)
if self._dropout_ratio is not None:
self.dropout = Dropout(rate=self._dropout_ratio)
self.dropout.build(self.out_conv_shape)
self._trainable_weights.extend(self.dropout._trainable_weights)
self._non_trainable_weights.extend(self.dropout._non_trainable_weights)
super(Conv2DLayer, self).build(input_shape)
def call(self, inputs):
x = self.conv(inputs)
if self.use_bn:
x = self.bn(x)
if self._dropout_ratio is not None:
x = self.dropout(x)
return x
def compute_output_shape(self, input_shape):
return self.out_conv_shape
最后,这是修改后的模型:
def get_model():
input_ = Input(shape=(28, 28, 1))
x = Conv2DLayer(filters=64, kernel_size=3, activation="relu")(input_)
x = MaxPool2D(pool_size=(2,2))(x)
x = Conv2DLayer(filters=128, kernel_size=3, activation="relu")(input_)
x = MaxPool2D(pool_size=(2,2))(x)
x = Conv2DLayer(filters=256, kernel_size=3, activation="relu")(input_)
x = MaxPool2D(pool_size=(2,2))(x)
x = Flatten()(x)
x = Dense(128, activation="relu")(x)
x = Dense(64, activation="relu")(x)
x = Dense(10, activation="softmax")(x)
mod = Model(inputs=input_, outputs=x)
return mod
对于带有自定义图层的模型,我设法获得了相同的火车精度(97%),但是验证精度却停留在50%左右。
感谢Matias Valdenegro,我通过修改call
方法来解决此问题:
def call(self, inputs):
training = K.learning_phase()
x = self.conv(inputs)
if self.use_bn:
x = self.bn(x, training=training)
if self._dropout_ratio is not None:
x = self.dropout(x, training=training)
return x
使用K
keras.backend
模块。
答案 0 :(得分:0)
Dropout和Batch Normalization在训练和测试/推断期间的行为不同,并且您的图层没有任何行为,因此其在推断期间使用这些内部图层作为训练模式,从而产生错误的结果。
我不确定,但是我认为您可以通过将training
函数调用中的call
参数传递给各层来解决此问题,例如:
def call(self, inputs, training=None):
x = self.conv(inputs)
if self.use_bn:
x = self.bn(x, training=training)
if self._dropout_ratio is not None:
x = self.dropout(x, training=training)
return x
这应该使内层在训练和测试/推断阶段中的工作方式有所不同。