为什么 model.evaluate() 与使用 for 循环手动计算的精度不同?

时间:2021-04-27 16:09:15

标签: python tensorflow machine-learning keras deep-learning

在学习了 Tensorflow's site 上的迁移学习教程后,我对 model.evaluate() 与手动计算准确率的工作原理有一个疑问。

最后,在微调之后,在评估和预测部分,我们使用 model.evaluate() 在测试集上计算准确度如下:

loss, accuracy = model.evaluate(test_dataset)
print('Test accuracy :', accuracy)
6/6 [==============================] - 2s 217ms/step - loss: 0.0516 - accuracy: 0.9740
Test accuracy : 0.9739583134651184

接下来,作为可视化练习的一部分,我们从测试集中的一批图像中手动生成预测:

# Apply a sigmoid since our model returns logits
predictions = tf.nn.sigmoid(predictions)
predictions = tf.where(predictions < 0.5, 0, 1)

但是,也可以扩展此功能以计算整个测试集的预测并将它们与实际值进行比较以产生平均准确度:

all_acc=tf.zeros([], tf.int32) #initialize array to hold all accuracy indicators (single element)
for image_batch, label_batch in test_dataset.as_numpy_iterator():
    predictions = model.predict_on_batch(image_batch).flatten() #run batch through model and return logits
    predictions = tf.nn.sigmoid(predictions) #apply sigmoid activation function to transform logits to [0,1]
    predictions = tf.where(predictions < 0.5, 0, 1) #round down or up accordingly since it's a binary classifier
    accuracy = tf.where(tf.equal(predictions,label_batch),1,0) #correct is 1 and incorrect is 0
    all_acc = tf.experimental.numpy.append(all_acc, accuracy)
all_acc = all_acc[1:]  #drop first placeholder element
avg_acc = tf.math.reduce_mean(tf.dtypes.cast(all_acc, tf.float16)) 
print('My Accuracy:', avg_acc.numpy()) 
My Accuracy: 0.974

现在,如果 model.evaluate() 通过对 logit 模型输出应用 sigmoid 并使用教程建议的阈值 0.5 来生成预测,那么我手动计算的准确度应该等于 Tensorflow 的 model.evaluate() 的准确度输出功能。这确实是本教程的情况。我的准确度:0.974 = model.evaluate() 函数的准确度。但是,当我使用与教程相同的卷积基础训练的模型尝试相同的代码时,但不同的 gabor 图像(不像教程中的猫和狗),我的准确度不再等于 model.evaluate() 准确度:

current_set = set9 #define set to process. must do all nine, one at a time
all_acc=tf.zeros([], tf.int32) #initialize array to hold all accuracy indicators (single element)
loss, acc = model.evaluate(current_set) #now test the model's performance on the test set
for image_batch, label_batch in current_set.as_numpy_iterator():
    predictions = model.predict_on_batch(image_batch).flatten() #run batch through model and return logits
    predictions = tf.nn.sigmoid(predictions) #apply sigmoid activation function to transform logits to [0,1]
    predictions = tf.where(predictions < 0.5, 0, 1) #round down or up accordingly since it's a binary classifier
    accuracy = tf.where(tf.equal(predictions,label_batch),1,0) #correct is 1 and incorrect is 0
    all_acc = tf.experimental.numpy.append(all_acc, accuracy)
all_acc = all_acc[1:]  #drop first placeholder element
avg_acc = tf.math.reduce_mean(tf.dtypes.cast(all_acc, tf.float16))
print('My Accuracy:', avg_acc.numpy()) 
print('Tf Accuracy:', acc) 
My Accuracy: 0.7183
Tf Accuracy: 0.6240000128746033

有谁知道为什么会有差异? model.evaluate() not 是否使用 sigmoid?或者它是否使用了与 0.5 不同的阈值?或者也许这是我没有考虑的其他事情?请注意,我的新模型使用与教程中的猫和狗不同的图像进行训练,但代码是相同的。

在此先感谢您的帮助!

0 个答案:

没有答案