Question

我已经使用Tensorflow从零开始实现了一个基本的神经网络，并在MNIST时尚数据集上对其进行了训练。它经过正确训练，可以在10多个课程中输出~88-90%左右的测试准确性。

现在，我编写了predict()函数，该函数使用训练后的权重预测给定图像的类别。这是代码：

def predict(images, trained_parameters):

    Ws, bs = [], []
    parameters = {}

    for param in trained_parameters.keys():
        parameters[param] = tf.convert_to_tensor(trained_parameters[param])

    X = tf.placeholder(tf.float32, [images.shape[0], None], name = 'X')
    Z_L = forward_propagation(X, trained_parameters)

    p = tf.argmax(Z_L) # Working fine
    # p = tf.argmax(tf.nn.softmax(Z_L)) # not working if softmax is applied

    with tf.Session() as session:
        prediction = session.run(p, feed_dict={X: images})

    return prediction

此函数使用forward_propagation()函数，该函数返回最后一层（Z）的加权总和，而不是由于TensorFlows A需要{{1 }}而不是tf.nn.softmax_cross_entropy_with_logits()，因为它将通过应用softmax来计算Z，请参考this link for details.

现在在A函数中，当我使用A而不是predict()（激活）进行预测时，它可以正常工作。如果我在Z上计算出softmax（这是最后一层的激活A），则会给出错误的预测。

为什么它对加权和Z给出正确的预测？我们不应该首先应用softmax激活（并计算A）然后进行预测吗？

如果有人想看一下我的整个代码，这里是指向我的colab笔记本的链接：Link to Notebook Gist

那么我在这里想念什么？

Answer 1

大多数TF函数（例如tf.nn.softmax）默认情况下都假定批次尺寸是第一个-这是一种惯例。现在，我在您的代码中注意到您的批次维度是第二个维度，即您的输出形状为(output_dim=10, batch_size=?)，因此，tf.nn.softmax正在沿批次维度计算softmax激活。

不遵循约定并没有错-只需了解它们即可。沿着第一轴计算softmax的argmax应该会产生预期的结果（这相当于获取logits的argmax）：

p = tf.argmax(tf.nn.softmax(Z_L, axis=0))

此外，如果有多个图像馈入网络，我还建议沿第一轴计算argmax。

为什么对激活值（Softmax）进行预测会得出错误的结果？

1 个答案: