Question

我正在尝试（正确或不正确）编写带有Tensorflow后端的Keras SGD优化器的修改形式。这个想法是使用较小的学习率在指定的时间段安排SGD的“重新启动”，但又不保存和重新加载模型。为了使学习速率的衰减速率像这种重启一样，我不仅要跟踪总迭代次数，还要跟踪自上次“重启”以来的迭代次数。

因此，在创建SGD优化器对象（即SGD_VAR）后，我将“自上次重启以来的迭代”计数器（self.iteration_ref）初始化为0，就像self.iterations初始化为0一样。然后，每次迭代时，除非有重置，否则我将每个计数器增加1，在这种情况下，我会将迭代计数器（self.iterations_ref）重置为1。此处显示了我使用的代码（它继承自Keras的SGD类，并且进行少量修改：

class SGD_VAR(SGD):
"""Stochastic gradient descent optimizer.

Includes support for momentum,
learning rate decay, and Nesterov momentum.

# Arguments
    lr: float >= 0. Learning rate.
    momentum: float >= 0. Parameter that accelerates SGD
        in the relevant direction and dampens oscillations.
    decay: float >= 0. Learning rate decay over each update.
    nesterov: boolean. Whether to apply Nesterov momentum.
"""

def __init__(self, lr=0.05, momentum=0., decay=0.,
             nesterov=False, lr_dict = {},
             batches_per_epoch = 1562,
             **kwargs):

    super(SGD_VAR, self).__init__(lr, momentum, decay,
                                  nesterov, **kwargs)
    if lr_dict == {}:
        lr_dict = {0:lr}

    self.lr_dict = lr_dict
    self.batches_per_epoch = batches_per_epoch

    with K.name_scope(self.__class__.__name__):
        # Here is where I initialize *MY* iterations counter
        self.iterations_ref = K.variable(0, dtype='int64', 
                                         name='iterations_ref')
        self.new_lr = K.variable(lr, name='new_lr')

@interfaces.legacy_get_updates_support
def get_updates(self, loss, params):


    def lr_stepper(iteration, lr):
        ''' Wrapped python method used by tensor 
            to determine desired learning rate'''

        # Change the learning rate when specified 
        # in lr_dict(dict of epochs: learning rates)
        for x in self.lr_dict:
            temp = tf.Variable((x-1) * self.batches_per_epoch, 
                               dtype=iteration.dtype)
            if tf.equal(temp, iteration):
                return tf.constant(self.lr_dict[x], dtype=lr.dtype)

        return lr

    # NOTE: K.update_add and K.update 
            return tf.assign_add and tf.assign, respectively
    self.updates = [K.update_add(self.iterations, 1)]


    # Key lines to change self.lr
    new_lr = tf.contrib.eager.py_func(func=lr_stepper,
                                      inp=[self.iterations, self.lr], 
                                      Tout=tf.float32)

    # Note: self.lr != new_lr indicates a RESET has occurred
    new_iter_ref = tf.cond(tf.math.equal(self.lr,new_lr),
                           lambda: K.update_add(self.iterations_ref, 1),
                           lambda: K.update(self.iterations_ref, 1))
    self.updates.append(K.update(self.lr, new_lr))
    self.updates.append(new_iter_ref)

    # Temporary code to debug output
    self.iterations = tf.Print(self.lr,
             [self.iterations,self.iterations_ref, self.lr],
                               message="\n Debug Vals:" )

我使用tf.Print打印出self.iterations，self.iterations_ref和self.lr。每个数字都放在方括号中。我曾期望tf.Print显示self.iterations和self.iterations_ref彼此相等（不包括任何重置的影响），但是我看到的是它们保持了差异的1-即我看到的输出是：

 Debug Vals:[1][0][0.1]
 Debug Vals:[2][1][0.1]
 Debug Vals:[3][2][0.1]
 Debug Vals:[4][3][0.1]

...

我曾期望：

 Debug Vals:[1][1][0.1]
 Debug Vals:[2][2][0.1]
 Debug Vals:[3][3][0.1]
 Debug Vals:[4][4][0.1]

...

这是为什么？（注意：我正在使用Keras 2.2.4和tensorflow 1.8）

为什么我的两个tensorflow变量不同步更新？

0 个答案: