我正在对样本数据的正面和负面情绪进行分类。我使用了以下代码片段。
在第20行打印预期预测之前,一切看起来都不错。
但是当我尝试使用指标衡量准确度时,它给了我" NAN"值。您能否查看我的代码并帮助我找出问题所在。
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn import metrics
import csv
# Read in the training data.
with open("/Users/max/train.csv", 'r') as file:
reviews = list(csv.reader(file))
with open("/Users/max/test.csv",'r') as file:
test_reviews = list(csv.reader(file))
vectorizer = TfidfVectorizer(min_df=1)
train_features = vectorizer.fit_transform([review[0] for review in reviews])
test_features = vectorizer.transform([test_review[0] for test_review in test_reviews])
nb = MultinomialNB()
nb.fit(train_features, [int(review[1]) for review in reviews])
predictions = nb.predict(test_features)
print("prediction : {0}".format(predictions))
actual = [int(r[1]) for r in test_reviews]
fpr, tpr, threshold = metrics.roc_curve(actual, predictions, pos_label=1)
print("Multinomial naive bayes AUC: {0}".format(metrics.auc(fpr, tpr)))
样本数据集采用此格式
i like google , 1
i dont really like microsoft , -1
这是控制台中的输出
prediction : [1 -1]
/Library/Python/2.7/site-packages/sklearn/metrics/ranking.py:496: UndefinedMetricWarning: No positive samples in y_true, true positive value should be meaningless UndefinedMetricWarning)
Multinomial naive bayes AUC: nan
答案 0 :(得分:1)
您数据中没有任何真正的正面实例的可能性。
答案 1 :(得分:0)
当我错误地连接val indexedColors: Map[String, Color] = colors.map(c => (c.toCodeString, c)).toMap
val exists = indexColors.containsKey("1001099")
和X_train
并将此数据帧用作我的y_train
时,出现此错误。基本上,X_train
的最后一列中有X_train
,事实并非如此。希望这会有所帮助!