我不知道为什么它不要求任何合奏。也许有些参数混乱了?
森林覆盖类型数据:
X =(581012,54)的形状
y =(581012,)的形状
from sklearn.ensemble import VotingClassifier
from sklearn.ensemble import BaggingClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn import model_selection
classifier_names = ["logistic regression", "linear SVM", "nearest centroids", "decision tree"]
classifiers = [LogisticRegression, LinearSVC, NearestCentroid, DecisionTreeClassifier]
ensemble1 = VotingClassifier(classifiers)
ensemble2 = BaggingClassifier(classifiers)
ensemble3 = AdaBoostClassifier(classifiers)
ensembles = [ensemble1, ensemble2, ensemble3]
seed = 7
for ensemble in ensembles:
kfold = model_selection.KFold(n_splits=10, random_state=seed)
for classifier in classifiers:
model = ensemble(base_estimator=classifier, random_state=seed)
results = model_selection.cross_val_score(ensemble, X, Y, cv=kfold)
print(results.mean())
我希望这些合奏为分类器运行,但是第一个合奏没有运行。我先将顺序更改为BaggingClassifier
,但是显示了相同的错误,无法调用。
答案 0 :(得分:0)
对于VotingClassifier
,估计量应为具有名称和模型的元组列表。
请注意,您已经创建了一个模型类,然后在元组内部给出。
估计器 :(字符串,估计器)元组的列表 VotingClassifier上的方法将适合那些原始副本 将存储在class属性中的估计量 自我估计可以使用set_params将估算器设置为“无”。
对于其他两个集合,您只能对一个基本模型求值,而对同一基本模型求n个估算器。就像执行代码一样,遍历不同的分类器,但是每次都重新定义了集成模型。
base_estimator :对象或无,可选(默认=无) 估计量以适合数据集的随机子集。如果没有,则 基本估计量是决策树。
n_estimators :整数,可选(默认值= 10)基数 合奏中的估计量。
尝试一下!
iris = datasets.load_iris()
X, y = iris.data[:, 1:3], iris.target
classifier_names = ["logistic regression","linear SVM","nearest centroids", "decision tree"]
classifiers = [LogisticRegression(), LinearSVC(), NearestCentroid(), DecisionTreeClassifier()]
ensemble1 = VotingClassifier([(n,c) for n,c in zip(classifier_names,classifiers)])
ensemble2 = BaggingClassifier(base_estimator= DecisionTreeClassifier() , n_estimators= 10)
ensemble3 = AdaBoostClassifier(base_estimator= DecisionTreeClassifier() , n_estimators= 10)
ensembles = [ensemble1,ensemble2,ensemble3]
seed = 7
for ensemble in ensembles:
kfold = model_selection.KFold(n_splits=10, random_state=seed)
results = model_selection.cross_val_score(ensemble, X, y, cv=kfold)
print(results.mean())
答案 1 :(得分:0)
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier, BaggingClassifier, VotingClassifier
from sklearn import model_selection
import warnings
warnings.filterwarnings("ignore")
seed = 7
classifier_names = ["logistic regression","linear SVM","nearest centroids", "decision tree"]
classifiers = [LogisticRegression, LinearSVC, NearestCentroid, DecisionTreeClassifier]
for classifier in classifiers:
ensemble1 = RandomForestClassifier(estimator=classifier(), n_estimators= 20, random_state=seed)
ensemble2 = AdaBoostClassifier(base_estimator=classifier(),
n_estimators= 5, learning_rate=1, random_state=seed)
ensemble3 = BaggingClassifier(base_estimator=classifier(),
max_samples=0.5, n_estimators=20, random_state=seed)
ensemble4 = VotingClassifier([(n,c) for n,c in zip(classifier_namess, classifiers)], voting="soft")
ensembles = [ensemble1, ensemble2, ensemble3, ensemble4]
for ensemble in ensembles:
kfold = model_selection.KFold(n_splits=10, random_state=seed)
results = model_selection.cross_val_score(ensemble, X[1:100], y[1:100], cv=kfold)
print("The mean accuracy of {}:".format(results.mean()))