如何使用不同的可训练变量从Inception-3检查点恢复训练

时间:2017-05-06 03:23:45

标签: tensorflow tf-slim

我有一个非常常见的用例,即冻结Inception的底层并仅训练前两层,然后降低学习率并微调整个Inception模型。

这是我运行第一部分的代码

train_dir='/home/ubuntu/pynb/TF play/log-inceptionv3flowers'
with tf.Graph().as_default():
    tf.logging.set_verbosity(tf.logging.INFO)

    dataset = get_dataset()
    images, _, labels = load_batch(dataset, batch_size=32)

    # Create the model, use the default arg scope to configure the batch norm parameters.
    with slim.arg_scope(inception.inception_v3_arg_scope()):
        logits, _ = inception.inception_v3(images, num_classes=5, is_training=True)

    # Specify the loss function:
    one_hot_labels = slim.one_hot_encoding(labels, 5)
    tf.losses.softmax_cross_entropy(one_hot_labels, logits)
    total_loss = tf.losses.get_total_loss()

    # Create some summaries to visualize the training process:
    tf.summary.scalar('losses/Total Loss', total_loss)

    # Specify the optimizer and create the train op:
    optimizer = tf.train.RMSPropOptimizer(0.001, 0.9,
                                    momentum=0.9, epsilon=1.0)
    train_op = slim.learning.create_train_op(total_loss, optimizer, variables_to_train=get_variables_to_train())

    # Run the training:
    final_loss = slim.learning.train(
        train_op,
        logdir=train_dir,
        init_fn=get_init_fn(),
        number_of_steps=4500,
        save_summaries_secs=30,
        save_interval_secs=30,
        session_config=tf.ConfigProto(gpu_options=gpu_options))

print('Finished training. Last batch loss %f' % final_loss)

运行正常,然后我的代码运行第二部分

train_dir='/home/ubuntu/pynb/TF play/log-inceptionv3flowers'
with tf.Graph().as_default():
    tf.logging.set_verbosity(tf.logging.INFO)

    dataset = get_dataset()
    images, _, labels = load_batch(dataset, batch_size=32)

    # Create the model, use the default arg scope to configure the batch norm parameters.
    with slim.arg_scope(inception.inception_v3_arg_scope()):
        logits, _ = inception.inception_v3(images, num_classes=5, is_training=True)

    # Specify the loss function:
    one_hot_labels = slim.one_hot_encoding(labels, 5)
    tf.losses.softmax_cross_entropy(one_hot_labels, logits)
    total_loss = tf.losses.get_total_loss()
    # Create some summaries to visualize the training process:
    tf.summary.scalar('losses/Total Loss', total_loss)

    # Specify the optimizer and create the train op:
    optimizer = tf.train.RMSPropOptimizer(0.0001, 0.9,
                                    momentum=0.9, epsilon=1.0)
    train_op = slim.learning.create_train_op(total_loss, optimizer)

    # Run the training:
    final_loss = slim.learning.train(
        train_op,
        logdir=train_dir,
        init_fn=get_init_fn(),
        number_of_steps=10000,
        save_summaries_secs=30,
        save_interval_secs=30,
        session_config=tf.ConfigProto(gpu_options=gpu_options))

print('Finished training. Last batch loss %f' % final_loss)

请注意,在第二部分中,我没有将任何内容传递到create_train_op的{​​{1}}参数中。然后显示此错误

variables_to_train

我怀疑它正在寻找InceptionV3 / Conv2d_4a_3x3层的RMSProp变量,这是不存在的,因为我没有在前一个检查点训练该层。我不确定如何达到我想要的效果,因为我在文档中看不到有关如何实现此目的的示例。

1 个答案:

答案 0 :(得分:1)

TF Slim支持从变量名称不匹配的检查点读取,如下所述:https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/slim/python/slim/learning.py#L146

您可以指定检查点中的变量名称如何映射到模型中的变量。

我希望有所帮助!