Question

在tensorflow / models / resnet实现中，存在基于global_step的可变学习率。理想情况下，在跨运行从检查点还原模型时，我希望使用相同的global_step，以便可变学习率与上次训练模型时使用的global_step相对。这样一来，孤立地运行历元组变得容易得多，这样我就可以从中断的地方开始。

但是当我恢复模型并继续训练时，权重将恢复，但是在日志中，我看到该步骤已重置。

INFO：tensorflow：损失= 0.22592029，步长= 0（289.312秒）

global_step是通过调用tf.train.get_or_create_global_step()获得的，并使用 tf.train.MomentumOptimizer。

在我的tf.estimator.Estimator模型函数中：

if mode == tf.estimator.ModeKeys.TRAIN:
    global_step = tf.train.get_or_create_global_step()

    # This function returns a closure configured to build the variable learning rate tensor.
    learning_rate_fn = build_learning_rate_fn(fine_tune=fine_tune, batch_size=batch_size, num_classes=num_classes, training_set_size=training_set_size)

    # The global_step tensor passed to it, represents the ordinal of the current batch.
    # The learning rate tensor returns an appropriate learning rate based on the value of global_step tensor at runtime.
    # When restoring the estimator graph, the global_step is always 0 and monotonically increases.
    # Is there a way to persist it so that its value picks up where it left off?
    learning_rate = learning_rate_fn(global_step)

    optimizer = tf.train.MomentumOptimizer(
        learning_rate=learning_rate,
        momentum=0.9
    )

编辑：作为一种可选方法，我不是在要求global_step可用以随时间计算可变学习率，而是在考虑改变学习率衰减的边界以使其更符合我自己的需求。方法。

例如，在训练ImageNet时，tensorflow / models / resnet将边界历元定义为[30, 60, 80, 90]。但是对于诸如二进制分类之类的较不复杂的问题，这是一个很大的时期，尤其是如果仅从头开始针对100K图像进行训练。

为了保持每个单独运行的更新，我可以使用更小和更接近的边界历元来进行衰减，例如[10, 20, 25, 30]，因为我希望将孤立的训练运行保持在30历元以下时间。

我试图了解这是否是一种合理的方法。

如何在tf.Estimator检查点中保留global_step？

0 个答案: