当我打印分数时,我得到0.90
准确度,当我打印CrossValScore时,我得到:
[0.99382716 0.99382716 0.99689441 0.99689441 0.99689441]
看起来像是对它已经看过的数据进行测试,我想让它在看不见的数据上测试模型。但我不知道我错在哪里
此外,当我将max_features参数更改为任何一个数字时,结果是我仍然在得分和CrossValScore中获得相同的结果。
#some preprocessing here..
#saving preprocessed traain and test data
traindata = ast.literal_eval(open('pretprocesirano.txt').read())
testdata = ast.literal_eval(open('pretprocesiranoTEST.py').read())
label_train=np.array(label_train)
label_test=np.array(label_test)
vectorizer= CountVectorizer(tokenizer=lambda x:x.split())
traindataCV=vectorizer.fit_transform(traindata)
wordlist=vectorizer.vocabulary_
SavedVectorizer = CountVectorizer(vocabulary=wordlist)
testdataCV=SavedVectorizer.transform(testdata)
clf = MultinomialNB()
clf.fit(traindataCV, label_train)
scores=clf.score(testdataCV,label_test)
CrossValScore = cross_val_score(clf, traindataCV, label_train, cv=5)