为什么我们使用" tf.make_template()"在训练阶段,我们必须在测试阶段再次使用tf.make_template()

时间:2018-03-04 01:22:41

标签: tensorflow deep-learning conv-neural-network

我定义了一个名为" drrn_model"的模型函数。当我训练我的模型时,我使用模型:

shared_model = tf.make_template('shared_model', drrn_model)
train_output = shared_model(train_input, is_training=True)

它开始逐步训练,当我想继续从旧点训练模型时,我可以将.ckpt文件恢复到模型。

但是当我测试我训练过的模型时会出现问题。 我直接使用下面的代码而不使用tf.make_template:

train_output = drrn_model(train_input, is_training=False)

然后终端给了我很多NotFoundError,如" Key LastLayer/Variable_2 not found in checkpoint"。 但是当我使用

shared_model = tf.make_template('shared_model', drrn_model)
output_tensor = shared_model(input_tensor,is_training=False)

它可以正常测试。

那么为什么我们必须在测试阶段再次使用tf.make_template()。当我们构建模型时,drrn_modelmake_template之间有什么区别。

还有另一个问题:张量流中的BN层。 我尝试了很多方法,但输出总是错误(总是比没有BN层的版本差)。 我的最新版本的模型有BN层:

tensor = None

def drrn_model(input_tensor, is_training):
    with tf.device("/gpu:0"):
        with tf.variable_scope("FirstLayer"):
            conv_0_w = tf.get_variable("conv_w", [3, 3, 1, 128], initializer=tf.random_normal_initializer(stddev=np.sqrt(2.0 / 9)))
            tensor = tf.nn.conv2d(tf.nn.relu(batchnorm(input_tensor, is_training= is_training)), conv_0_w, strides=[1,1,1,1], padding="SAME")
            first_layer = tensor
        ### recursion ###
        with tf.variable_scope("recycle", reuse=False):
            tensor = drrnblock(first_layer, tensor, is_training)
        for i in range(1,10):
            with tf.variable_scope("recycle", reuse=True):
                tensor = drrnblock(first_layer, tensor, is_training)
        ### end layer ###
        with tf.variable_scope("LastLayer"):
            conv_end_w = tf.get_variable("conv_w", [3, 3, 128, 1], initializer=tf.random_normal_initializer(stddev=np.sqrt(2.0 / 9)))
            conv_end_layer = tf.nn.conv2d(tf.nn.relu(batchnorm(tensor, is_training= is_training)), conv_end_w, strides=[1, 1, 1, 1], padding='SAME')

        tensor = tf.add(input_tensor,conv_end_layer)

        return tensor


def drrnblock(first_layer, input_layer, is_training):

    conv1_w = tf.get_variable("conv1__w", [3, 3, 128, 128], initializer=tf.random_normal_initializer(stddev=np.sqrt(2.0 / 9)))
    conv1_layer = tf.nn.conv2d(tf.nn.relu(batchnorm(input_layer, is_training= is_training)), conv1_w, strides=[1,1,1,1], padding= "SAME")

    conv2_w = tf.get_variable("conv2__w", [3, 3, 128, 128], initializer=tf.random_normal_initializer(stddev=np.sqrt(2.0 / 9)))
    conv2_layer = tf.nn.conv2d(tf.nn.relu(batchnorm(conv1_layer, is_training=is_training)), conv2_w, strides=[1, 1, 1, 1], padding="SAME")

    tensor = tf.add(first_layer, conv2_layer)

    return tensor

def batchnorm(inputs, is_training, decay = 0.999):# there is my BN layer
    scale = tf.Variable(tf.ones([inputs.get_shape()[-1]]))
    beta = tf.Variable(tf.zeros([inputs.get_shape()[-1]]))
    pop_mean = tf.Variable(tf.zeros([inputs.get_shape()[-1]]), trainable=False)
    pop_var = tf.Variable(tf.ones([inputs.get_shape()[-1]]), trainable=False)
    if is_training:
        batch_mean, batch_var = tf.nn.moments(inputs,[0,1,2])
        print("batch_mean.shape: ", batch_mean.shape)
        train_mean = tf.assign(pop_mean, pop_mean*decay+batch_mean*(1-decay))
        train_var = tf.assign(pop_var, pop_var*decay+batch_var*(1-decay))

        with tf.control_dependencies([train_mean, train_var]):
            return tf.nn.batch_normalization(inputs,batch_mean,batch_var,beta,scale,variance_epsilon=1e-3)
    else:
        return tf.nn.batch_normalization(inputs,pop_mean,pop_var,beta,scale,variance_epsilon=1e-3)

请告诉我我的代码出了什么问题。 非常感谢!!

0 个答案:

没有答案