提供多个管道作为投票分类器的输入 - sklearn

时间:2016-11-13 01:20:36

标签: python-3.x machine-learning scikit-learn sklearn-pandas

我正在尝试构建一个以多个管道作为输入的投票分类器。我很擅长这个。以下是我正在使用的代码:

clf1 = MultinomialNB(alpha= 0.99, fit_prior= True)
clf2 = Pipeline([('vect', CountVectorizer(max_features=5000,ngram_range=(1,2))),
                    ('tfidf', TfidfTransformer(use_idf= True)),
                    ('clf', SGDClassifier(alpha=0.001,learning_rate='optimal',loss= 'epsilon_insensitive'
                                          ,penalty= 'l2',n_iter = 100, random_state=42))])
clf3 = Pipeline([('vect', CountVectorizer(max_features=3500)),
                    ('tfidf', TfidfTransformer(use_idf=False)),
                    ('clf', SVC(random_state= 42,kernel="linear",degree=1,decision_function_shape=None))])
clf4 = Pipeline([('vect', CountVectorizer(max_features = 4000)),
                    ('tfidf', TfidfTransformer(use_idf=False)),
                    ('clf', RandomForestClassifier(random_state = 42,criterion="entropy"))])
eclf = VotingClassifier(estimators=[('mnb', clf1), ('sgd', clf2), ('svm', clf3), ('rf',clf4)], voting='hard')
eclf = eclf.fit(train_data,train_label)

p = eclf.predict(test_data)
np.mean(p==test_class)

该代码基本上构建了4个分类器 - 多项式朴素贝叶斯,SGD分类器,具有线性内核和随机森林分类器的SVM。当我尝试拟合我的数据时,它会给我以下错误:

could not convert string to float: "training string here"

如果我尝试在单个分类器上调用fit,则模式运行正常。有人可以帮忙吗?

0 个答案:

没有答案