我正在尝试(正确或不正确)编写带有Tensorflow后端的Keras SGD优化器的修改形式。这个想法是使用较小的学习率在指定的时间段安排SGD的“重新启动”,但又不保存和重新加载模型。为了使学习速率的衰减速率像这种重启一样,我不仅要跟踪总迭代次数,还要跟踪自上次“重启”以来的迭代次数。
因此,在创建SGD优化器对象(即SGD_VAR)后,我将“自上次重启以来的迭代”计数器(self.iteration_ref
)初始化为0,就像self.iterations
初始化为0一样。然后,每次迭代时,除非有重置,否则我将每个计数器增加1,在这种情况下,我会将迭代计数器(self.iterations_ref
)重置为1。此处显示了我使用的代码(它继承自Keras的SGD类,并且进行少量修改:
class SGD_VAR(SGD):
"""Stochastic gradient descent optimizer.
Includes support for momentum,
learning rate decay, and Nesterov momentum.
# Arguments
lr: float >= 0. Learning rate.
momentum: float >= 0. Parameter that accelerates SGD
in the relevant direction and dampens oscillations.
decay: float >= 0. Learning rate decay over each update.
nesterov: boolean. Whether to apply Nesterov momentum.
"""
def __init__(self, lr=0.05, momentum=0., decay=0.,
nesterov=False, lr_dict = {},
batches_per_epoch = 1562,
**kwargs):
super(SGD_VAR, self).__init__(lr, momentum, decay,
nesterov, **kwargs)
if lr_dict == {}:
lr_dict = {0:lr}
self.lr_dict = lr_dict
self.batches_per_epoch = batches_per_epoch
with K.name_scope(self.__class__.__name__):
# Here is where I initialize *MY* iterations counter
self.iterations_ref = K.variable(0, dtype='int64',
name='iterations_ref')
self.new_lr = K.variable(lr, name='new_lr')
@interfaces.legacy_get_updates_support
def get_updates(self, loss, params):
def lr_stepper(iteration, lr):
''' Wrapped python method used by tensor
to determine desired learning rate'''
# Change the learning rate when specified
# in lr_dict(dict of epochs: learning rates)
for x in self.lr_dict:
temp = tf.Variable((x-1) * self.batches_per_epoch,
dtype=iteration.dtype)
if tf.equal(temp, iteration):
return tf.constant(self.lr_dict[x], dtype=lr.dtype)
return lr
# NOTE: K.update_add and K.update
return tf.assign_add and tf.assign, respectively
self.updates = [K.update_add(self.iterations, 1)]
# Key lines to change self.lr
new_lr = tf.contrib.eager.py_func(func=lr_stepper,
inp=[self.iterations, self.lr],
Tout=tf.float32)
# Note: self.lr != new_lr indicates a RESET has occurred
new_iter_ref = tf.cond(tf.math.equal(self.lr,new_lr),
lambda: K.update_add(self.iterations_ref, 1),
lambda: K.update(self.iterations_ref, 1))
self.updates.append(K.update(self.lr, new_lr))
self.updates.append(new_iter_ref)
# Temporary code to debug output
self.iterations = tf.Print(self.lr,
[self.iterations,self.iterations_ref, self.lr],
message="\n Debug Vals:" )
我使用tf.Print
打印出self.iterations
,self.iterations_ref
和self.lr
。每个数字都放在方括号中。我曾期望tf.Print
显示self.iterations
和self.iterations_ref
彼此相等(不包括任何重置的影响),但是我看到的是它们保持了差异的1-即我看到的输出是:
Debug Vals:[1][0][0.1]
Debug Vals:[2][1][0.1]
Debug Vals:[3][2][0.1]
Debug Vals:[4][3][0.1]
...
我曾期望:
Debug Vals:[1][1][0.1]
Debug Vals:[2][2][0.1]
Debug Vals:[3][3][0.1]
Debug Vals:[4][4][0.1]
...
这是为什么? (注意:我正在使用Keras 2.2.4和tensorflow 1.8)