如何使用Tensorflow创建预测和地面真实标签的混淆矩阵?

时间:2016-03-02 19:47:03

标签: python tensorflow confusion-matrix

我在使用TensorFlow的帮助下实现了Nueral Network模型的分类。但是,我不知道如何通过使用预测分数(准确度)来绘制混淆矩阵。我不是TensorFlow的专家,仍处于学习阶段。在这里,我粘贴了下面的代码,请告诉我如何编写代码以便从以下代码中产生混淆:

# Launch the graph
with tf.Session() as sess:
sess.run(init)

# Set logs writer into folder /tmp/tensorflow_logs
#summary_writer = tf.train.SummaryWriter('/tmp/tensorflow_logs', graph_def=sess.graph_def)

# Training cycle
for epoch in range(training_epochs):
    avg_cost = 0.
    total_batch = int(X_train.shape[0]/batch_size)

    # Loop over total length of batches
    for i in range(total_batch):  
        #picking up random batches from training set of specific size
        batch_xs, batch_ys = w2v_utils.nextBatch(X_train, y_train, batch_size)
        # Fit training using batch data
        sess.run(optimizer, feed_dict={x: batch_xs, y: batch_ys})
        # Compute average loss
        avg_cost += sess.run(cost, feed_dict={x: batch_xs, y: batch_ys})/total_batch
        # Write logs at every iteration
        #summary_str = sess.run(merged_summary_op, feed_dict={x: batch_xs, y: batch_ys})
        #summary_writer.add_summary(summary_str, epoch*total_batch + i)

    #append loss
    loss_history.append(avg_cost)

    # Display logs per epoch step
    if (epoch % display_step == 0):           
        correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))            
        # Calculate training  accuracy
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
        trainAccuracy = accuracy.eval({x: X_train, y: y_train})
        train_acc_history.append(trainAccuracy)           
        # Calculate validation  accuracy
        valAccuracy = accuracy.eval({x: X_val, y: y_val})
        val_acc_history.append(valAccuracy) 
        print "Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f}".format(avg_cost), "train=",trainAccuracy,"val=", valAccuracy

print "Optimization Finished!"
# Test model
correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
# Calculate accuracy
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print "Final Training Accuracy:", accuracy.eval({x: X_train, y: y_train})
print "Final Test Accuracy:", accuracy.eval({x: X_test, y: y_test})
print "Final Gold Accuracy:", accuracy.eval({x: X_gold, y: y_gold})

到目前为止,我能够打印预测分数,但未能实施混淆矩阵请帮忙。 注意: (我使用一个热矢量代表我的标签)

2 个答案:

答案 0 :(得分:6)

如果你想产生一个混淆矩阵,然后再进行精确和回忆,你首先要得到你的真阳性,真阴性,假阳性和漏报的数量。方法如下:

为了更好的可读性,我编写了非常详细的代码。

def evaluation(logits,labels):
"Returns correct predictions, and 4 values needed for precision, recall and F1 score"


    # Step 1:
    # Let's create 2 vectors that will contain boolean values, and will describe our labels

    is_label_one = tf.cast(labels, dtype=tf.bool)
    is_label_zero = tf.logical_not(is_label_one)
    # Imagine that labels = [0,1]
    # Then
    # is_label_one = [False,True]
    # is_label_zero = [True,False]

    # Step 2:
    # get the prediction and false prediction vectors. correct_prediction is something that you choose within your model.
    correct_prediction = tf.nn.in_top_k(logits, labels, 1, name="correct_answers")
    false_prediction = tf.logical_not(correct_prediction)

    # Step 3:
    # get the 4 metrics by comparing boolean vectors
    # TRUE POSITIVES
    true_positives = tf.reduce_sum(tf.to_int32(tf.logical_and(correct_prediction,is_label_one)))

    # FALSE POSITIVES
    false_positives = tf.reduce_sum(tf.to_int32(tf.logical_and(false_prediction, is_label_zero)))

    # TRUE NEGATIVES
    true_negatives = tf.reduce_sum(tf.to_int32(tf.logical_and(correct_prediction, is_label_zero)))

    # FALSE NEGATIVES
    false_negatives = tf.reduce_sum(tf.to_int32(tf.logical_and(false_prediction, is_label_one)))


return true_positives, false_positives, true_negatives, false_negatives

# Now you can do something like this in your session:

true_positives, \
false_positives, \
true_negatives, \
false_negatives = sess.run(evaluation(logits,labels), feed_dict=feed_dict)

# you can print the confusion matrix using the 4 values from above, or get precision and recall:
precision = float(true_positives) / float(true_positives+false_positives)
recall = float(true_positives) / float(true_positives+false_negatives)

# or F1 score:
F1_score = 2 * ( precision * recall ) / ( precision+recall )

答案 1 :(得分:0)

目前,我使用此解决方案来获取混淆矩阵:

# load the data
(train_x, train_y), (dev_x, dev_y), (test_x, test_y) = dataLoader.load()

# build the classifier
classifier = tf.estimator.DNNClassifier(...)

# train the classifier
classifier.train(input_fn=lambda:train_input_fn(), steps=1000)

# evaluate and prediction on the test set
test_evaluate = classifier.evaluate(input_fn=lambda:eval_input_fn())
test_predict = classifier.predict(input_fn = lambda:eval_input_fn())

# parse the prediction to retrieve the predicted labels
predictions = []

for i in list(test_predict):
    predictions.append(i['class_ids'][0])

# build the prediction matrix
matrix = tf.confusion_matrix(test_y, predictions)

#display the prediction matrix
with tf.Session():
    print(str(tf.Tensor.eval(matrix)))

但是我不能说服我的循环来检索预测标签......应该有更好的Python方法来做到这一点......(或TensorFlow方式......)