Caffe有多步腐烂。计算结果为base_lr * gamma ^ (floor(step))
,其中step
在每个衰减步骤后递增。例如[100, 200]
衰减步骤和global step=101
我希望获得base_lr * gamma ^ 1
,global step=201
以及更多我希望获得base_lr * gamma ^ 2
等等。
我试图基于指数衰减源实现它,但我无能为力。这是指数衰减代码(https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/training/learning_rate_decay.py#L27):
def exponential_decay(learning_rate, global_step, decay_steps, decay_rate,
staircase=False, name=None):
with ops.name_scope(name, "ExponentialDecay",
[learning_rate, global_step,
decay_steps, decay_rate]) as name:
learning_rate = ops.convert_to_tensor(learning_rate, name="learning_rate")
dtype = learning_rate.dtype
global_step = math_ops.cast(global_step, dtype)
decay_steps = math_ops.cast(decay_steps, dtype)
decay_rate = math_ops.cast(decay_rate, dtype)
p = global_step / decay_steps
if staircase:
p = math_ops.floor(p)
return math_ops.mul(learning_rate, math_ops.pow(decay_rate, p), name=name)
我必须将decay_steps
作为某种数组传递 - python数组或Tensor。另外,我必须(?)在上面的公式中传递current_decay_step
(step
。)
第一个选项:在没有张量的纯python中,它非常简单:
decay_steps.append(global_step)
p = sorted(decay_steps).index(global_step) # may be there must be `+1` or `-1`. I hope that main idea is clear
我不能这样做,因为在TF中没有任何排序。我不知道实施它需要多少时间。
第二个选项:类似下面的代码。它没有多少原因。首先,我不知道如何在tf.cond
中将args传递给函数。其次,即使我通过args:Can cond support TF ops with side effects?
def new_decay_step(decay_steps):
decay_steps = decay_steps[1:]
current_decay_step.assign(current_decay_step + 1)
return tf.no_op()
tf.cond(tf.greater(tf.shape(decay_steps)[0], 0),
tf.cond(tf.greater(global_step, decay_steps[0]), new_decay_step, tf.no_op()),
tf.no_op())
p = current_decay_step
第三个选项:它无效,因为我无法使用tensor[another_tensor]
获取元素。
# if len(decay_steps) > (current_step + 1):
# if global_step > decay_steps[current_step + 1]:
# current_step += 1
current_decay_step = tf.cond(tf.greater(tf.shape(current_decay_step)[0], tf.add(current_decay_step,1)),
tf.cond(tf.greater(global_step, decay_steps[tf.add(current_decay_step + 1]), tf.add(current_decay_step,1), tf.add(current_decay_step,0)),
tf.add(current_decay_step, 0)
我该怎么办?
UPD:我几乎可以使用第二个选项。
我可以制作
def nothing: return tf.no_op()
tf.cond(tf.greater(global_step, decay_steps[0]),
functools.partial(new_decay_step, decay_steps),
nothing)
但由于某种原因,内部tf.cond
不起作用
对于此代码,我收到错误fn1 must be callable
def nothing: return tf.no_op()
tf.cond(tf.greater(tf.shape(decay_steps)[0], 0),
tf.cond(tf.greater(global_step, decay_steps[0]),
functools.partial(new_decay_step, decay_steps),
nothing),
nothing)
UPD2:内部tf.cond
无效,因为它们返回张量,而args必须是函数。
我没有检查它,但似乎它有效(至少它不会因错误而崩溃):
tf.cond(tf.logical_and(tf.greater(tf.shape(decay_steps)[0], 0), tf.greater(global_step, decay_steps[0])),
functools.partial(new_decay_step, decay_steps),
nothing)
UPD3:我意识到 UPD2 中的代码无法正常工作,因为我无法在功能中更改列表。
此外,我不知道tf.logical_and
的哪些部分真正被执行。
我制作了以下代码:
class ohmy:
def __init__(self, decay_steps):
self.decay_steps = decay_steps
def multistep_decay(self, learning_rate, global_step, current_decay_step, decay_steps, decay_rate,
staircase=False, name=None):
learning_rate = tf.convert_to_tensor(learning_rate, name="learning_rate")
dtype = learning_rate.dtype
global_step = tf.cast(global_step, dtype)
decay_rate = tf.cast(decay_rate, dtype)
def new_step():
self.decay_steps = self.decay_steps[1:]
current_decay_step.assign(current_decay_step + 1)
return current_decay_step
def curr_step():
return current_decay_step
current_decay_step = tf.cond(tf.logical_and(tf.greater(tf.shape(self.decay_steps)[0], 0), tf.greater(global_step, self.decay_steps[0])),
new_step,
curr_step)
a = tf.Print(global_step, [global_step], "global")
b = tf.Print(self.decay_steps, [self.decay_steps], "decay_steps")
c = tf.Print(current_decay_step, [current_decay_step], "step")
with tf.control_dependencies([a, b, c, current_decay_step]):
p = current_decay_step
if staircase:
p = tf.floor(p)
return tf.mul(learning_rate, tf.pow(decay_rate, p), name=name)
decay_steps = [3,4,5,6,7]
decay_steps = tf.convert_to_tensor(decay_steps, dtype=tf.float32)
current_decay_step = tf.Variable(0.0, trainable=False)
global_step = tf.Variable(0, trainable=False)
decay_rate = 0.5
c=ohmy(decay_steps)
lr = ohmy.multistep_decay(c, 0.010, global_step, current_decay_step, decay_steps, decay_rate)
#lr = tf.train.exponential_decay(0.001, global_step=global_step, decay_steps=2, decay_rate=0.5, staircase=True)
tf.scalar_summary('learning_rate', lr)
opt = tf.train.AdamOptimizer(lr)
#...train loop and so on
根本不起作用。输出结果如下:
I tensorflow/core/kernels/logging_ops.cc:79] step[0]
I tensorflow/core/kernels/logging_ops.cc:79] global[0]
E tensorflow/core/client/tensor_c_api.cc:485] The tensor returned for MergeSummary/MergeSummary:0 was not valid.
Traceback (most recent call last):
File "flownet_new.py", line 528, in <module>
summary_str = sess.run(summary_op)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 382, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 655, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 723, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 743, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors.InvalidArgumentError: The tensor returned for MergeSummary/MergeSummary:0 was not valid.
正如您所看到的,没有衰减步骤的输出。我甚至无法调试它!
现在我绝对不知道如何使用一个功能。
顺便说一下,要么我做错了,要么tf.contrib.slim
不适用于学习率下降。
现在最简单的解决方案就是按照cleros所说的那样在火车循环中制作你想要的东西。
答案 0 :(得分:2)
我在tensorflow中寻找这个功能,我发现它可以使用tf.train.piecewise_constant轻松实现。以下是tensorflow的api_docs示例:(https://www.tensorflow.org/api_docs/python/tf/train/piecewise_constant)
示例:使用前100000步的1.0的学习率,步骤100001到110000的0.5,以及任何其他步骤的0.1。
global_step = tf.Variable(0, trainable=False)
boundaries = [100000, 110000]
values = [1.0, 0.5, 0.1]
learning_rate = tf.train.piecewise_constant(global_step, boundaries, values)
之后,每当我们执行优化步骤时,我们都会增加global_step。
答案 1 :(得分:1)
使用tf.train.exponential_decay()
,这正是您正在寻找的。衰减的学习率计算如下:
decayed_learning_rate = learning_rate *
decay_rate ^ (global_step / decay_steps)
请注意,decay_steps
参数是一个整数(不是数组,也不是张量),保持学习速率发生变化的迭代周期。在您的示例中decay_steps=100
。
答案 2 :(得分:0)
您可以尝试case
,switch
和merge
。
例如,假设base_lr
为0.1
,而gamma
为0.1
,则可以使用
import tensorflow as tf
from tensorflow.python.ops import control_flow_ops
global_step = global_step = tf.placeholder(dtype=tf.int64)
learning_rate = tf.case(
[(tf.less(global_step, 100), lambda: tf.constant(0.1)),
(tf.less(global_step, 200), lambda: tf.constant(0.01))],
default=lambda: tf.constant(0.001))
with tf.Session() as sess:
print(sess.run(learning_rate, {global_step: 0})) # 0.1
print(sess.run(learning_rate, {global_step: 1})) # 0.1
print(sess.run(learning_rate, {global_step: 99})) # 0.1
print(sess.run(learning_rate, {global_step: 100})) # 0.01
print(sess.run(learning_rate, {global_step: 101})) # 0.01
print(sess.run(learning_rate, {global_step: 199})) # 0.01
print(sess.run(learning_rate, {global_step: 200})) # 0.001
print(sess.run(learning_rate, {global_step: 201})) # 0.001
或
import tensorflow as tf
from tensorflow.python.ops import control_flow_ops
global_step = global_step = tf.placeholder(dtype=tf.int64)
learning_rate = control_flow_ops.merge(
[control_flow_ops.switch(tf.constant(0.1),
tf.less(global_step, 100))[1],
control_flow_ops.switch(tf.constant(0.01),
tf.logical_and(tf.greater_equal(global_step, 100),
tf.less(global_step, 200)))[1],
control_flow_ops.switch(tf.constant(0.001),
tf.greater_equal(global_step, 200))[1]])[0]
with tf.Session() as sess:
print(sess.run(learning_rate, {global_step: 0})) # 0.1
print(sess.run(learning_rate, {global_step: 1})) # 0.1
print(sess.run(learning_rate, {global_step: 99})) # 0.1
print(sess.run(learning_rate, {global_step: 100})) # 0.01
print(sess.run(learning_rate, {global_step: 101})) # 0.01
print(sess.run(learning_rate, {global_step: 199})) # 0.01
print(sess.run(learning_rate, {global_step: 200})) # 0.001
print(sess.run(learning_rate, {global_step: 201})) # 0.001
使用tensorflow 0.12.1
测试代码。