Question

import tensorflow as tf

tf_X = tf.placeholder("float",[None,3073])
tf_Y = tf.placeholder("float",[None,10])
tf_W = tf.Variable(0.001*tf.random_normal([3073,10]))

#tf.random_uniform([3073,10],-0.1,0.1)#
tf_learning_rate = 0.0001

hypothesis = tf.nn.softmax(tf.matmul(tf_X,tf_W)) #out put is softmax value for each class

cost = tf.reduce_mean(tf.reduce_sum(tf_Y*-tf.log(hypothesis), reduction_indices=1))

optimizer = tf.train.GradientDescentOptimizer(tf_learning_rate).minimize(cost)

init = tf.initialize_all_variables()



with tf.Session() as sess:
    sess.run(init)
    print sess.run(cost, feed_dict = {tf_X:X_dev,tf_Y:onehot_y_dev})
    print sess.run
    for step in xrange(400):

        sess.run(optimizer, feed_dict = {tf_X:X_dev,tf_Y:onehot_y_dev}) # we have to make one hot coding for y
        if step % 200 ==0:
            print step,sess.run(cost, feed_dict={tf_X:X_dev,tf_Y:onehot_y_dev})

我试图在张量流中实现softmax-cross熵。当我sess.run（成本）它返回一个数字（2.322）但是当我运行GradientDescentOptimizer时，返回的成本是Nan ... 这里发生了什么？我是否错误地实现了优化器功能？

Answer 1

O.K我自己发现了这个问题。我总是听说过爆炸性的渐变，但这里的问题是爆炸成本。我不知道有一个爆炸性的成本问题

在我的第一次降级后，我的高学习率迫使我的成本因成为-log而爆炸（数字非常接近于零）。如果您遇到与我相同的问题，请降低学习率，或使用tf.clip来缓解此类问题

Tensorflow Gradient Optimizer返回Nan，而交叉熵成本返回数字

1 个答案: