Question

我在10K灰度图像上训练CNN。该网络有6个转换层，1个完全连接层和1个输出层。

当我开始训练失去的疯狂高但稳步减少，但我的准确率从1.0开始并且也减少。并且从72％下降到30％并再次回升。此外，当我在看不见的图像上运行acc.eval({x: test_images, y: test_lables})时，准确率约为16％。

另外，我有6个类，所有这些类都是单热编码的。

我想我可能会错误地比较预测的输出但是无法在我的代码中看到错误...

这是我的代码

pred = convolutional_network(x)
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels = y, logits = pred))
train_op = tf.train.AdamOptimizer(learning_rate=0.01).minimize(loss)

prediction = tf.nn.softmax(pred)
correct = tf.equal(tf.argmax(prediction, 1), tf.argmax(y, 1))
acc = tf.reduce_mean(tf.cast(correct, 'float'))

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer()) # Initialize all the variables
    saver = tf.train.Saver()

    time_full_start = time.clock()
    print("RUNNING SESSION...")
    for epoch in range(num_epochs):
        train_batch_x = []
        train_batch_y = []
        epoch_loss = 0
        i = 0
        while i < len(images):
            start = i
            end = i+ batch_size
            train_batch_x = images[start:end]
            train_batch_y = labels[start:end]
            op , ac, loss_value = sess.run([train_op, acc, loss], feed_dict={x: train_batch_x, y: train_batch_y})
            epoch_loss += loss_value
            i += batch_size
        print('Epoch : ', epoch+1, ' of ', num_epochs, ' - Loss for epoch: ', epoch_loss, ' Accuracy: ', ac)

    time_full_end = time.clock()
    print('Full time elapse:', time_full_end - time_full_start)


    print('Accuracy:', acc.eval({x: test_images, y: test_labels}))

    save_path = saver.save(sess, MODEL_PATH)
    print("Model saved in file: " , save_path)

这是输出

时代：1/100 - 时代的损失：8.94737603121e + 13准确度：1.0

时代：2的100 - 时代的损失：212052447727.0准确度：1.0

时代：3 of 100 - 时代的损失：75150603462.2准确度：1.0

时代：4 of 100 - 时代的损失：68164116617.4准确度：1.0

时代：5的100 - 时代的损失：18505190718.8准确度：0.99

时代：6 of 100 - 时代的损失：11373286689.0准确度：0.96

时代：第7页，共100页 - 时代损失：3129798657.75准确度：0.07

时代：8 of 100 - 时代损失：374790121.375准确度：0.58

时代：9的100 - 时代的损失：105383792.938准确度：0.72

时代：10的100 - 时代的损失：49705202.4844准确度：0.66

纪元：11 of 100 - 纪元损失：30214170.7909准确度：0.36

时代：12的100 - 时代的损失：18653020.5084准确度：0.82

时代：13 of 100 - 时代损失：14793638.35准确度：0.39

时代：14 of 100 - 时代的损失：10196079.7003准确度：0.73

纪元：15 of 100 - 纪元损失：6727522.37319准确度：0.47

时代：16 of 100 - 时代的损失：4593769.05838准确度：0.68

时代：17 of 100 - 时代损失：3669332.09406准确度：0.44

时代：18 of 100 - 时代的损失：2850924.81662准确度：0.59

时代：19 of 100 - 时代的损失：1780678.12892准确度：0.51

时代：20 of 100 - 纪元损失：1855037.40652准确度：0.61

时代：21 of 100 - 时代的损失：1012934.52827准确度：0.53

时代：22 of 100 - 时代的损失：649319.432669准确度：0.55

时代：23 of 100 - 时代的损失：841660.786938准确度：0.57

时代：24 of 100 - 时代损失：490148.861691准确度：0.55

时代：25 of 100 - 时代的损失：397315.021568准确度：0.5

......................

时代：99的100 - 时代的损失：4412.61703086准确度：0.57

时代：100的100 - 时代的损失：4530.96991658准确度：0.62

全职时间：794.5787720000001

测试准确度：0.158095

我已尝试过多种学习率和网络规模，但似乎可以让它发挥作用。任何帮助将不胜感激

Answer 1

请注意，我的回答也通过审核和调试完整的代码（在问题中不可见）。但我仍然认为，如果有人面临类似的问题，下面的问题通常足以值得审查 - 你可能只是在这里得到解决方案！

<小时/> 疯狂的高损失值可能意味着您没有正确地将输入图像从 int8 转换为小 float32 值（事实上，他确实如此）并且您没有＆＃ 39; t使用批量标准化和/或正则化（事实上，两者都丢失了。）此外，在此代码中：

prediction = tf.nn.softmax(pred)
correct = tf.equal(tf.argmax(prediction, 1), tf.argmax(y, 1))

计算softmax值是完全没必要的，因为softmax是一个严格单调的函数，它只是缩放预测，pred中最大的值将是最大的prediction，你得到相同的结果通过

correct = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))

鉴于您的网络运行的价值非常高，tf.nn.softmax()当取幂和除以总和时，它无意中将所有内容减少为零，然后tf.argmax()只是ac选择0级，直到数字下降一点。除此之外，您不会累积op , ac, loss_value = sess.run([train_op, acc, loss], feed_dict={x: train_batch_x, y: train_batch_y})：

epoch accuracy

因此，您打印的Failed to load resource: the server responded with a status of 404 (Not Found)fa-regular-400.woff Failed to load resource: the server responded with a status of 404 (Not Found)fa-regular-400.woff2 Failed to load resource: the server responded with a status of 404 (Not Found) fa-regular-400.ttf不是那个，它只是最后一批的准确性。如果您的图像是按类别排序的，并且您没有随机化批次，那么您可能会在每个时期结束时获得零级图像。这可以解释为什么你在前几个时期获得100％的准确度，直到超高数字下降一点，而softmax不再为零。（事实证明情况确实如此。）

即使修好了上述内容，网络也根本没有学到任何东西。事实证明，当他添加随机化时，图像和标签的随机化方式不同，自然会产生恒定的 1/6 精度。

解决了所有问题后，网络能够在100个时期之后学会达到98％的准确率。

大纪元：100/100损失：6.20184610883总损失：25.4021390676 acc：97.976191％

Tensorflow - 准确度从1.0开始，随着损失而减少

1 个答案: