TensorFlow中损失函数(MSE)的NaN值

时间:2016-05-12 00:20:44

标签: tensorflow

我想使用前馈神经网络使用TensorFlow输出连续的实际值。我的输入值当然也是连续的实际值。

我希望我的网有两个隐藏层,并使用MSE作为成本函数,所以我已经定义了这样:

def mse(logits, outputs):
    mse = tf.reduce_mean(tf.pow(tf.sub(logits, outputs), 2.0))
    return mse

def training(loss, learning_rate):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate)
    train_op = optimizer.minimize(loss)
    return train_op

def inference_two_hidden_layers(images, hidden1_units, hidden2_units):
    with tf.name_scope('hidden1'):
        weights = tf.Variable(tf.truncated_normal([WINDOW_SIZE, hidden1_units],stddev=1.0 / math.sqrt(float(WINDOW_SIZE))),name='weights')
        biases = tf.Variable(tf.zeros([hidden1_units]),name='biases')
        hidden1 = tf.nn.relu(tf.matmul(images, weights) + biases)

    with tf.name_scope('hidden2'):
        weights = tf.Variable(tf.truncated_normal([hidden1_units, hidden2_units],stddev=1.0 / math.sqrt(float(hidden1_units))),name='weights')
        biases = tf.Variable(tf.zeros([hidden2_units]),name='biases')
        hidden2 = tf.nn.relu(tf.matmul(hidden1, weights) + biases)

    with tf.name_scope('identity'):
        weights = tf.Variable(tf.truncated_normal([hidden2_units, 1],stddev=1.0 / math.sqrt(float(hidden2_units))),name='weights')
        biases = tf.Variable(tf.zeros([1]),name='biases')

        logits = tf.matmul(hidden2, weights) + biases

   return logits

我正在进行批量培训,每一步我都会评估train_op和loss操作员。

_, loss_value = sess.run([train_op, loss], feed_dict=feed_dict)

问题是我在评估损失函数时得到了一些NaN值。如果我只使用一个只有一个隐藏层的神经网络,就不会发生这种情况:

def inference_one_hidden_layer(inputs, hidden1_units):
    with tf.name_scope('hidden1'):
        weights = tf.Variable(
    tf.truncated_normal([WINDOW_SIZE, hidden1_units],stddev=1.0 / math.sqrt(float(WINDOW_SIZE))),name='weights')
        biases = tf.Variable(tf.zeros([hidden1_units]),name='biases')
        hidden1 = tf.nn.relu(tf.matmul(inputs, weights) + biases)

    with tf.name_scope('identity'):
        weights = tf.Variable(
    tf.truncated_normal([hidden1_units, NUM_CLASSES],stddev=1.0 / math.sqrt(float(hidden1_units))),name='weights')
        biases = tf.Variable(tf.zeros([NUM_CLASSES]),name='biases')
        logits = tf.matmul(hidden1, weights) + biases

    return logits

为什么在使用两个隐藏层网时会获得NaN损失值?

1 个答案:

答案 0 :(得分:3)

记住你的学习率。如果您扩展网络,则需要学习更多参数。这意味着您还需要降低学习率。

对于高学习率,您的体重会爆炸。此外,您的输出值将会爆炸。