我尝试使用Optimizers.schedules.ExponentialDecay距离作为Adm优化器的learning_rate,但是在GradientTape中训练模型时,我不知道如何传递“步骤”。
我使用tensorflow-gpu-2.0-alpha0和python3.6。 而且我读了文档https://tensorflow.google.cn/versions/r2.0/api_docs/python/tf/optimizers/schedules/ExponentialDecay,但不知道如何解决。
initial_learning_rate = 0.1
lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(
initial_learning_rate,
decay_steps=100000,
decay_rate=0.96)
optimizer = tf.optimizers.Adam(learning_rate = lr_schedule)
for epoch in range(self.Epoch):
...
...
with GradientTape as tape:
pred_label = model(images)
loss = calc_loss(pred_label, ground_label)
grads = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(grads, model.trainable_variables))
# I tried this but the result seem not right.
# I want to pass "epoch" as "step" to lr_schedule
答案 0 :(得分:0)
使用tf.keras.optimizers.schedules.ExponentialDecay()
的实例可能不适用于GadientTape
,它更适合Keras的model.fit()
,我的理解是您需要在特定时间后降低或安排学习率迭代/步骤的数量,因此有一个解决方法,您可以使用优化器类的 get_config()
和 from_config()
方法手动安排学习率。
def exponential_decay(optimizer, decay_rate):
#get the optimizer configuration dictionary
opt_cfg = optimizer.get_config()
""" The opt_cfg dictionary will look like this given that if you set initial learning rate to 0.1.
{'name': 'Adam', 'learning_rate': 0.1, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999,
'epsilon': 1e-07, 'amsgrad': False}"""
#change the value of learning rate by multiplying decay rate with learning rate to get new learning rate
opt_cfg['learning_rate'] = opt_cfg['learning_rate']*decay_rate
""" the changed opt_cfg dictionary will look like this, if you have initial learning rate of 0.1 and decay rate of 0.96,
then new opt_cfg will look like this:
{'name': 'Adam', 'learning_rate': 0.096, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999,
'epsilon': 1e-07, 'amsgrad': False}"""
#now just pass this updated optimizer configuartion dictionary to from_config() method and you are done, now
#your optimzer will use new learning rate
optimizer = optimizer.from_config(opt_cfg)
return optimizer
decay_steps=100000
decay_rate=0.96
optimizer = tf.optimizers.Adam(learning_rate = lr_schedule)
#loop for epoch
for epoch in range(self.Epoch):
...
...
#loop to iterate over batches
for itr, images in enumerate(batch_images):
if decay_steps%itr==0:
optimizer = exponential_decay(optimizer, decay_rate)
with GradientTape as tape:
pred_label = model(images)
loss = calc_loss(pred_label, ground_label)
grads = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(grads, model.trainable_variables))
所以我们在这里要做的是,当迭代/步数达到我们定义的限制或衰减步数时,假设您想在每 100K 步后更改学习率,我们调用我们的方法 exponential_deacy()
传入我们优化器的实例和衰减率,该方法将改变优化器的学习率并返回优化器实例,更新后的学习率,您可以使用 optimizer.get_config()
方法进行验证。
有关详细信息,请查看 get_config() 和 from_config()。