Scikit-learn - 在测量精度的同时获得NAN值。

时间:2015-10-18 11:41:30

标签: python

我正在对样本数据的正面和负面情绪进行分类。我使用了以下代码片段。

在第20行打印预期预测之前,一切看起来都不错。

但是当我尝试使用指标衡量准确度时,它给了我" NAN"值。您能否查看我的代码并帮助我找出问题所在。

from sklearn.naive_bayes import  MultinomialNB
from sklearn.feature_extraction.text import TfidfVectorizer  
from sklearn import metrics
import csv

# Read in the training data.
with open("/Users/max/train.csv", 'r') as file:
  reviews = list(csv.reader(file))

with open("/Users/max/test.csv",'r') as file:
    test_reviews = list(csv.reader(file))

vectorizer = TfidfVectorizer(min_df=1)
train_features = vectorizer.fit_transform([review[0] for review in reviews])
test_features = vectorizer.transform([test_review[0] for test_review in test_reviews])

nb = MultinomialNB()
nb.fit(train_features, [int(review[1]) for review in reviews])

predictions = nb.predict(test_features)
print("prediction : {0}".format(predictions))

actual = [int(r[1]) for r in test_reviews]
fpr, tpr, threshold = metrics.roc_curve(actual, predictions, pos_label=1) 
print("Multinomial naive bayes AUC: {0}".format(metrics.auc(fpr, tpr)))

样本数据集采用此格式

i like google , 1
i dont really like microsoft , -1

这是控制台中的输出

prediction : [1 -1]
/Library/Python/2.7/site-packages/sklearn/metrics/ranking.py:496: UndefinedMetricWarning: No positive samples in y_true, true positive value should be meaningless UndefinedMetricWarning)
Multinomial naive bayes AUC: nan

2 个答案:

答案 0 :(得分:1)

您数据中没有任何真正的正面实例的可能性。

答案 1 :(得分:0)

当我错误地连接val indexedColors: Map[String, Color] = colors.map(c => (c.toCodeString, c)).toMap val exists = indexColors.containsKey("1001099") X_train并将此数据帧用作我的y_train时,出现此错误。基本上,X_train的最后一列中有X_train,事实并非如此。希望这会有所帮助!