混淆矩阵原始数据不匹配

时间:2016-09-23 04:06:12

标签: python scikit-learn confusion-matrix

我已经创建了一个可以正常工作的混淆矩阵,但它的原始内容似乎与标签无关。

我有一些字符串列表,分为列车和测试部分:

 train + test:
 positive: 16 + 4 = 20
 negprivate:  53 + 14 = 67
 negstratified: 893 + 224 = 1117

Confusion矩阵建立在测试数据上:

 [[  0  14   0]
 [  3 220   1]
 [  0   4   0]]

以下是代码:

my_tags = ['negprivate', 'negstratified', 'positive']

def plot_confusion_matrix(cm, title='Confusion matrix', cmap=plt.cm.Blues):
    logging.info('plot_confusion_matrix')
    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    plt.title(title)
    plt.colorbar()
    tick_marks = np.arange(len(my_tags))
    target_names = my_tags
    plt.xticks(tick_marks, target_names, rotation=45)
    plt.yticks(tick_marks, target_names)
    plt.tight_layout()
    plt.ylabel('True label')
    plt.xlabel('Predicted label') 
    plt.show()

def evaluate_prediction(target, predictions, taglist, title="Confusion matrix"):
    logging.info('Evaluate prediction')
    print('accuracy %s' % accuracy_score(target, predictions))
    cm = confusion_matrix(target, predictions)
    print('confusion matrix\n %s' % cm)
    print('(row=expected, col=predicted)')
    print 'rows: \n %s \n %s \n %s ' % (taglist[0], taglist[1], taglist[2])

    cm_normalized = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
    plot_confusion_matrix(cm_normalized, title + ' Normalized')

...

test_targets, test_regressors = zip(
    *[(doc.tags[0], doc2vec_model.infer_vector(doc.words, steps=20)) for doc in alltest]) 
logreg = linear_model.LogisticRegression(n_jobs=1, C=1e5)
logreg = logreg.fit(train_regressors, train_targets)
evaluate_prediction(test_targets, logreg.predict(test_regressors), my_tags, title=str(doc2vec_model))

但重点是我实际上必须查看结果矩阵中的数字并更改my_tags的顺序,以便它们可以相互一致。据我所知,这应该以某种自动方式进行。 其中,我想知道?

2 个答案:

答案 0 :(得分:0)

我认为这只是标签的排序顺序,即np.unique(target)的输出。

答案 1 :(得分:0)

总是最好有整数类标签,一切似乎都运行得更顺畅。您可以使用//create mail object $mail = new \SendGrid\Mail(); //set from $from = new \SendGrid\Email("SENDER NAME", "SENDER EMAIL"); $mail->setFrom($from); //set personalization $personalization = new \SendGrid\Personalization(); $to = new \SendGrid\Email("RECEIVER NAME", "RECEIVER EMAIL"); $personalization->addTo($to); $personalization->setSubject("SUBJECT"); //add substitutions (Dynamic value to be change in template) $personalization->addSubstitution(':name', "Any"); $mail->addPersonalization($personalization); $mail->setTemplateId("TEMPLATE_ID"); //send email $sg = new \SendGrid("API_KEY"); $response = $sg->client->mail()->send()->post($mail); ,即

来获取这些内容
LabelEncoder

现在您将from sklearn import preprocessing my_tags = ['negprivate', 'negstratified', 'positive'] le = preprocessing.LabelEncoder() new_tags = le.fit_transform(my_tags) 作为新标记。在进行绘图时,您希望标签直观,​​因此您可以使用[0 1 2]来获取标签,即

inverse_transform

输出:

le.inverse_transform(0)