我正在尝试构建一个以多个管道作为输入的投票分类器。我很擅长这个。以下是我正在使用的代码:
clf1 = MultinomialNB(alpha= 0.99, fit_prior= True)
clf2 = Pipeline([('vect', CountVectorizer(max_features=5000,ngram_range=(1,2))),
('tfidf', TfidfTransformer(use_idf= True)),
('clf', SGDClassifier(alpha=0.001,learning_rate='optimal',loss= 'epsilon_insensitive'
,penalty= 'l2',n_iter = 100, random_state=42))])
clf3 = Pipeline([('vect', CountVectorizer(max_features=3500)),
('tfidf', TfidfTransformer(use_idf=False)),
('clf', SVC(random_state= 42,kernel="linear",degree=1,decision_function_shape=None))])
clf4 = Pipeline([('vect', CountVectorizer(max_features = 4000)),
('tfidf', TfidfTransformer(use_idf=False)),
('clf', RandomForestClassifier(random_state = 42,criterion="entropy"))])
eclf = VotingClassifier(estimators=[('mnb', clf1), ('sgd', clf2), ('svm', clf3), ('rf',clf4)], voting='hard')
eclf = eclf.fit(train_data,train_label)
p = eclf.predict(test_data)
np.mean(p==test_class)
该代码基本上构建了4个分类器 - 多项式朴素贝叶斯,SGD分类器,具有线性内核和随机森林分类器的SVM。当我尝试拟合我的数据时,它会给我以下错误:
could not convert string to float: "training string here"
如果我尝试在单个分类器上调用fit,则模式运行正常。有人可以帮忙吗?