将One Out Scores 100%留下,我在这里缺少什么?

时间:2016-06-12 15:19:40

标签: scikit-learn cross-validation naivebayes

我正在尝试在我的文章上执行留一个CV,但是当我运行该程序时,我得到100%的准确性,我无法弄清楚我错过了什么。这是我的代码:

import sklearn
from sklearn.datasets import load_files
import numpy as np
from sklearn.cross_validation import cross_val_score, LeaveOneOut
from scipy.stats import sem
from sklearn.naive_bayes import MultinomialNB

bunch = load_files('corpus', shuffle = False)

X = bunch.data
y = bunch.target

from sklearn.feature_extraction.text import CountVectorizer
count_vect = CountVectorizer(stop_words = 'english')
X_counts = count_vect.fit_transform(X)

from sklearn.feature_extraction.text import TfidfTransformer
tfidf_transformer = TfidfTransformer()
X_tfidf = tfidf_transformer.fit_transform(X_counts)

estimator = MultinomialNB().fit(X_tfidf, y)
cv = LeaveOneOut(26)
scores = cross_val_score(estimator, X_tfidf, y, cv = cv)
print scores
print ("Mean score: {0:.3f} (+/-{1:.3f})").format(np.mean(scores), sem(scores))

我得到的输入数据分类相同,这有点奇怪。我的结果:

[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]
Mean score: 0.577 (+/-0.099)

我的输入数据分类:

([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

我不明白我的LOO CV失败的地方。 :S

帮助将不胜感激。

1 个答案:

答案 0 :(得分:1)

从最后一行打印时,您的准确度得分不是LOOCV 0.577吗?

cross_val_score函数返回每个CV折叠的分数数组(默认精度)。您打印的数组scores是准确度分数而不是预测值。