我最近正在复制http://karpathy.github.io/2015/05/21/rnn-effectiveness/中描述的char-RNN代码。已经在tensorflow中实现了代码,我所指的代码位于https://github.com/sherjilozair/char-rnn-tensorflow/blob/master/model.py。 我对上面提到的代码中的以下几行有疑问:
#1 loss = seq2seq.sequence_loss_by_example([self.logits],
[tf.reshape(self.targets, [-1])],
[tf.ones([args.batch_size * args.seq_length])],
args.vocab_size)
#2 self.cost = tf.reduce_sum(loss) / args.batch_size / args.seq_length
#3 self.final_state = last_state
#4 self.lr = tf.Variable(0.0, trainable=False)
#5 tvars = tf.trainable_variables()
#6 grads, _ = tf.clip_by_global_norm(tf.gradients(self.cost, tvars),
args.grad_clip)
#7 optimizer = tf.train.AdamOptimizer(self.lr)
#8 self.train_op = optimizer.apply_gradients(zip(grads, tvars))
问题在第4位:为什么我们将学习率设为0?将其设置为0是初始化学习率的最佳方法吗?
答案 0 :(得分:1)
Looking through the code, it looks like the learning rate is set to another value before it is ever used.
sess.run(tf.assign(model.lr, args.learning_rate * (args.decay_rate ** e)))
This is necessary, because the learning rate is set to decay over time and the Adam Optimizer is only initialized once. Any value should work, but zero seems most aesthetically pleasing to me.