交叉验证过度拟合?

时间:2017-08-30 15:12:49

标签: python-3.x machine-learning scikit-learn cross-validation

当我打印分数时,我得到0.90准确度,当我打印CrossValScore时,我得到: [0.99382716 0.99382716 0.99689441 0.99689441 0.99689441]看起来像是对它已经看过的数据进行测试,我想让它在看不见的数据上测试模型。但我不知道我错在哪里 此外,当我将max_features参数更改为任何一个数字时,结果是我仍然在得分和CrossValScore中获得相同的结果。

#some preprocessing here..
#saving preprocessed traain and test data

traindata = ast.literal_eval(open('pretprocesirano.txt').read())
testdata = ast.literal_eval(open('pretprocesiranoTEST.py').read())

label_train=np.array(label_train)
label_test=np.array(label_test)

vectorizer= CountVectorizer(tokenizer=lambda x:x.split())
traindataCV=vectorizer.fit_transform(traindata)

wordlist=vectorizer.vocabulary_
SavedVectorizer = CountVectorizer(vocabulary=wordlist)
testdataCV=SavedVectorizer.transform(testdata)

clf = MultinomialNB()
clf.fit(traindataCV, label_train) 

scores=clf.score(testdataCV,label_test)

CrossValScore = cross_val_score(clf, traindataCV, label_train, cv=5)

0 个答案:

没有答案