我正在Kaggle实施CNN for Digits Recognizer。结构是:
conv5x5(filters = 32)-conv5x5(filters = 32)-maxpool2x2-conv3x3(filters = 64)-conv3x3(filters = 64)-maxpool2x2-FC(512)-drop(keep prob = 0.25)-softmax( 10)
此结构在Digits Recognizer中的准确度为99.728%。
我想在conv层中添加批量规范。我这样添加它们:
#Forward propagation of the whole CNN#
def forward_propagation(X, keep_prob_l5, BN_is_training, conv_params, convstride1_shape, convstride2_shape, pool2_shape, poolstride2_shape, convstride3_shape, convstride4_shape, pool4_shape, poolstride4_shape, n_5, n_out):
W1 = conv_params['W1']
b1 = conv_params['b1']
W2 = conv_params['W2']
b2 = conv_params['b2']
W3 = conv_params['W3']
b3 = conv_params['b3']
W4 = conv_params['W4']
b4 = conv_params['b4']
Z1 = tf.nn.bias_add(tf.nn.conv2d(X, W1, strides=convstride1_shape, padding='SAME'), b1, data_format='NHWC')
Z1_bachnorm = tf.contrib.layers.batch_norm(Z1, center=True, scale=True, is_training=BN_is_training, data_format='NHWC')
A1 = tf.nn.relu(Z1_bachnorm)
Z2 = tf.nn.bias_add(tf.nn.conv2d(A1, W2, strides=convstride2_shape, padding='SAME'), b2, data_format='NHWC')
Z2_bachnorm = tf.contrib.layers.batch_norm(Z2, center=True, scale=True, is_training=BN_is_training, data_format='NHWC')
A2 = tf.nn.relu(Z2_bachnorm)
P2 = tf.nn.max_pool(A2, ksize=poolstride2_shape, strides=poolstride2_shape, padding='SAME')
Z3 = tf.nn.bias_add(tf.nn.conv2d(P2, W3, strides=convstride3_shape, padding='SAME'), b3, data_format='NHWC')
Z3_bachnorm = tf.contrib.layers.batch_norm(Z3, center=True, scale=True, is_training=BN_is_training, data_format='NHWC')
A3 = tf.nn.relu(Z3_bachnorm)
Z4 = tf.nn.bias_add(tf.nn.conv2d(A3, W4, strides=convstride4_shape, padding='SAME'), b4, data_format='NHWC')
Z4_bachnorm = tf.contrib.layers.batch_norm(Z4, center=True, scale=True, is_training=BN_is_training, data_format='NHWC')
A4 = tf.nn.relu(Z4_bachnorm)
P4 = tf.nn.max_pool(A4, ksize=poolstride4_shape, strides=poolstride4_shape, padding='SAME')
P4_flatten = tf.contrib.layers.flatten(P4)
A5 = tf.contrib.layers.fully_connected(P4_flatten, n_5, activation_fn=tf.nn.relu)
A5_drop = tf.nn.dropout(A5, keep_prob_l5)
Z_out = tf.contrib.layers.fully_connected(A5_drop, n_out, activation_fn=None)
return tf.transpose(Z_out)
BN_is_training
在培训时为True的占位符,在推理时为False。
update_ops的设置如下:
#Define the optimization method#
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
optimizer = tf.train.AdamOptimizer(learning_rate=decayed_learning_rate).minimize(cost)
然而,结果真的很奇怪。精度永远不会增加,成本也会不断增加。我是否在设置批量规范时犯了什么错误?
谢谢:D