Question

我训练了一个神经网络来识别手语字母表中的符号静态图像。

训练结束后，我尝试使用Keras预测函数评估其性能：

def predict_img(img, img_width, img_height, model):
    model.compile(loss='categorical_crossentropy',
              optimizer='adam'  ,
              metrics=['accuracy'])
    model.load_weights(weights_name)
    img = cv2.resize(img, (img_width, img_height)) 
    img = img/255.0
    img = img.reshape((1,) + img.shape)
    pred = model.predict(img, batch_size= 1, verbose=1)
    classes = np.argmax(pred)
    return classes

基本上我所做的是在输入图像上调用预测并以最高概率返回类的索引。
然后我在测试文件夹的每个图像上调用此函数，检查预测是否正确。为此，我将模型的预测与文件名的第一个字符（对应于该符号的字母）进行了比较。
我计算了所有正确的预测，以便计算所有测试图像上正确预测的百分比这是代码：

#Read the image with OpenCV
images = []
total_cases = 0
correct_predictions = 0
for dir in os.listdir(folder):  
    for filename in os.listdir(folder+"/"+dir):
        img = cv2.imread(os.path.join(folder+"/"+dir,filename))
        pred = predict_img(img, img_width, img_height, model)
        total_cases = total_cases + 1
        # if prediction is correct
        if sign_labels[pred] == filename[0]:
            correct_predictions = correct_p + 1

print(correct_predictions, " correct prediction on", c , "total tests")
print((correct_predictions * 100) / c, "%", "success on ", folder)

然而，这样，结果非常低（约50％）。

然后我使用混淆矩阵来评估模型的性能：

# CONFUSION MATRIX
Y_pred = model.predict_generator(testing_generator, nb_validation_samples // 64 + 1)
y_pred = np.argmax(Y_pred, axis=1)
print('Confusion Matrix')
cm = confusion_matrix(testing_generator.classes, y_pred)
print(cm)
print('Classification Report')

print(classification_report(testing_generator.classes, y_pred, target_names=sign_labels))

我得到以下结果：

Confusion Matrix
[[ 3  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  4  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  4  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  4  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  9  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  6  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  7  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0 10  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  4  0  0  0  0  0  0  0  0  0  0  0  1  0]
 [ 0  0  0  0  0  0  0  0  0  5  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  7  0  0  0  1  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  7  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  5  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  2  0  0  0  0  6  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  8  0  0  0  0  0  0  0]
 [ 1  0  0  0  0  0  0  0  0  0  0  0  0  0  0 10  0  1  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  9  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  2  0  9  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  4  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  8  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  7  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  9]]

Classification Report
             precision    recall  f1-score   support

          a       0.75      1.00      0.86         3
          b       1.00      1.00      1.00         4
          c       1.00      1.00      1.00         4
          d       1.00      1.00      1.00         4
          e       1.00      1.00      1.00         9
          f       1.00      1.00      1.00         6
          h       1.00      1.00      1.00         7
          i       1.00      1.00      1.00        10
          k       0.67      0.80      0.73         5
          l       1.00      1.00      1.00         5
          m       1.00      0.88      0.93         8
          n       1.00      1.00      1.00         7
          o       1.00      1.00      1.00         5
          p       1.00      0.75      0.86         8
          q       0.89      1.00      0.94         8
          r       0.83      0.83      0.83        12
          t       1.00      1.00      1.00         9
          u       0.82      0.82      0.82        11
          v       1.00      0.80      0.89         5
          w       1.00      1.00      1.00         8
          x       0.88      1.00      0.93         7
          y       1.00      1.00      1.00         9

avg / total       0.95      0.94      0.94       154

我的问题是：

从混淆矩阵和Keras的预测中得到如此不同的结果是否正常？我使用Keras预测错了吗？
每次执行脚本时，混淆矩阵的结果会有所不同，这是正常的吗？
混淆矩阵会导致结果偏差吗？

谢谢！

编辑：这是发电机：

test_datagen = ImageDataGenerator(
    rescale=1. / 255,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True)

testing_generator = test_datagen.flow_from_directory(
    folder,
    target_size=(img_width, img_height),
    batch_size=64,
    class_mode='categorical',
    shuffle=False)

混淆矩阵给出了Keras预测

0 个答案: