解决方案1：

Question

任何人都可以使用以下代码检查问题吗？我在构建模型的任何步骤中都错了吗？我已经添加了两个＆＃39; clf __＆＃39;参数。

clf=RandomForestClassifier()
pca = PCA()
pca_clf = make_pipeline(pca, clf) 


kfold = KFold(n_splits=10, random_state=22)



parameters = {'clf__n_estimators': [4, 6, 9], 'clf__max_features': ['log2', 
'sqrt','auto'],'clf__criterion': ['entropy', 'gini'], 'clf__max_depth': [2, 
 3, 5, 10], 'clf__min_samples_split': [2, 3, 5],
'clf__min_samples_leaf': [1,5,8] }

grid_RF=GridSearchCV(pca_clf,param_grid=parameters,
        scoring='accuracy',cv=kfold)
grid_RF = grid_RF.fit(X_train, y_train)
clf = grid_RF.best_estimator_
clf.fit(X_train, y_train)
grid_RF.best_score_

cv_result = cross_val_score(clf,X_train,y_train, cv = kfold,scoring = 
"accuracy")

cv_result.mean()

Answer 1

您假定以错误的方式使用make_pipeline。来自the documentation： -

这是Pipeline构造函数的简写;它不需要，并且不允许命名估算器。相反，他们的名字会自动设置为其类型的小写。

这意味着当您提供PCA对象时，其名称将设置为“pca”（小写），当您向其提供RandomForestClassifier对象时，它将被命名为“randomforestclassifier”，而不是“clf”as你在想。

所以现在你所做的参数网格是无效的，因为它包含clf__并且它不存在于管道中。

解决方案1：

替换此行：

pca_clf = make_pipeline(pca, clf)

用

pca_clf = Pipeline([('pca', pca), ('clf', clf)])

解决方案2：

如果您不想更改pca_clf = make_pipeline(pca, clf)行，请将parameters中所有clf的出现替换为'randomforestclassifier'，如下所示：

parameters = {'randomforestclassifier__n_estimators': [4, 6, 9], 
              'randomforestclassifier__max_features': ['log2', 'sqrt','auto'],
              'randomforestclassifier__criterion': ['entropy', 'gini'], 
              'randomforestclassifier__max_depth': [2, 3, 5, 10], 
              'randomforestclassifier__min_samples_split': [2, 3, 5],
              'randomforestclassifier__min_samples_leaf': [1,5,8] }

Sidenote ：无需在代码中执行此操作：

clf = grid_RF.best_estimator_
clf.fit(X_train, y_train)

best_estimator_已经安装了包含最佳参数的整个数据，因此您调用clf.fit()是多余的。

sklearn中估算器管道的参数clf无效

1 个答案:

解决方案1：

解决方案2：