有没有办法在sklearn中将不同的分类器组合成一个?我找到sklearn.ensamble
包。它包含不同的模型,如AdaBoost和RandofForest,但它们使用了决策树,我想使用不同的方法,如SVM和Logistic回归。是否可以使用sklearn?
答案 0 :(得分:2)
您是否只想进行多数投票?这不是afaik实施。但正如我所说,你可以平均predict_proba分数。或者您可以使用预测的LabelBinarizer并对其进行平均。那将实施投票方案。
即使您对概率不感兴趣,对预测概率求平均值可能比进行简单投票更有效。但是,如果没有尝试,这很难说清楚。
答案 1 :(得分:0)
是的,您可以在同一数据集上train
不同的模型并让每个模型进行预测
# Import functions to compute accuracy and split data
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
# Import models, including VotingClassifier meta-model
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier as KNN
from sklearn.ensemble import VotingClassifier
# Set seed for reproducibility
SEED = 1
现在实例化这些模型
# Instantiate lr
lr = LogisticRegression(random_state = SEED)
# Instantiate knn
knn = KNN(n_neighbors = 27)
# Instantiate dt
dt = DecisionTreeClassifier(min_samples_leaf = 0.13, random_state = SEED)
然后将它们定义为 list
个分类器,并将这些不同的分类器组合成一个元模型。
classifiers = [('Logistic Regression', lr),
('K Nearest Neighbours', knn),
('Classification Tree', dt)]
现在使用 for
循环遍历这个预定义的分类器列表
for clf_name, clf in classifiers:
# Fit clf to the training set
clf.fit(X_train, y_train)
# Predict y_pred
y_pred = clf.predict(X_test)
# Calculate accuracy
accuracy = accuracy_score(y_pred, y_test)
# Evaluate clf's accuracy on the test set
print('{:s} : {:.3f}'.format(clf_name, accuracy))
最后,我们将评估投票分类器的性能,该分类器采用列表分类器中定义的模型的输出,并通过多数投票分配标签。
# Voting Classifier
# Instantiate a VotingClassifier vc
vc = VotingClassifier(estimators = classifiers)
# Fit vc to the training set
vc.fit(X_train, y_train)
# Evaluate the test set predictions
y_pred = vc.predict(X_test)
# Calculate accuracy score
accuracy = accuracy_score(y_pred, y_test)
print('Voting Classifier: {:.3f}'.format(accuracy))