使用scikit-learn的集合方法

时间:2013-03-18 07:06:13

标签: scikit-learn

有没有办法在sklearn中将不同的分类器组合成一个?我找到sklearn.ensamble包。它包含不同的模型,如AdaBoost和RandofForest,但它们使用了决策树,我想使用不同的方法,如SVM和Logistic回归。是否可以使用sklearn?

2 个答案:

答案 0 :(得分:2)

您是否只想进行多数投票?这不是afaik实施。但正如我所说,你可以平均predict_proba分数。或者您可以使用预测的LabelBinarizer并对其进行平均。那将实施投票方案。

即使您对概率不感兴趣,对预测概率求平均值可能比进行简单投票更有效。但是,如果没有尝试,这很难说清楚。

答案 1 :(得分:0)

是的,您可以在同一数据集上train 不同的模型并让每个模型进行预测

# Import functions to compute accuracy and split data
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

# Import models, including VotingClassifier meta-model
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier as KNN
from sklearn.ensemble import VotingClassifier

# Set seed for reproducibility
SEED = 1

现在实例化这些模型

# Instantiate lr
lr = LogisticRegression(random_state = SEED)

# Instantiate knn
knn = KNN(n_neighbors = 27)

# Instantiate dt
dt = DecisionTreeClassifier(min_samples_leaf = 0.13, random_state = SEED)

然后将它们定义为 list 个分类器,并将这些不同的分类器组合成一个元模型

classifiers = [('Logistic Regression', lr), 
               ('K Nearest Neighbours', knn), 
               ('Classification Tree', dt)]

现在使用 for 循环遍历这个预定义的分类器列表

for clf_name, clf in classifiers:    

    # Fit clf to the training set
    clf.fit(X_train, y_train)    

    # Predict y_pred
    y_pred = clf.predict(X_test)

    # Calculate accuracy
    accuracy = accuracy_score(y_pred, y_test) 

    # Evaluate clf's accuracy on the test set
    print('{:s} : {:.3f}'.format(clf_name, accuracy))

最后,我们将评估投票分类器的性能,该分类器采用列表分类器中定义的模型的输出,并通过多数投票分配标签。

# Voting Classifier
# Instantiate a VotingClassifier vc
vc = VotingClassifier(estimators = classifiers)     

# Fit vc to the training set
vc.fit(X_train, y_train)   

# Evaluate the test set predictions
y_pred = vc.predict(X_test)

# Calculate accuracy score
accuracy = accuracy_score(y_pred, y_test)
print('Voting Classifier: {:.3f}'.format(accuracy))