Question

我使用XGBoost开发了一个管道，这使我得到了最好的估计。但是，尝试使用这种最佳估计量来预测我的测试集时，会出现以下错误：“ ValueError：仅熊猫数据帧支持使用字符串指定列”。

以下是我使用的管道代码：注意：ct只是使用分类列的SimpleImputer和OneHotEncoder以及使用数字列的SimpleImputer和StandardScaler的ColumnTransformer

ml_step_1 = ('transform', ct)
ml_step_2 = ('pca', PCA())
xgb = ('xgb', XGBRegressor())
xgb_pipe = Pipeline([ml_step_1, ml_step_2, xgb])
xgb = RandomizedSearchCV(xgb_pipe, xgb_param_grid, cv=kf, scoring='neg_mean_absolute_error');
xgb.fit(train_full_features, train_full_target);

运行以下管道，这是我得到的最佳估计器：

Best XGBoost parameters: {'xgb__silent': True, 'xgb__n_estimators': 1000, 'xgb__max_depth': 4, 'xgb__learning_rate': 0.09999999999999999, 'transform__num__imputer__strategy': 'median', 'transform__cat__imputer__strategy': 'most_frequent', 'pca__n_components': 68}

现在，我称呼这个最佳估算器并执行以下操作：

test_full_imp = pd.DataFrame(xgb.best_estimator_.named_steps['transform'].transform(test_full))
test_final = xgb.best_estimator_.named_steps['pca'].transform(test_full_imp)
predictions = xgb.best_estimator_.predict(test_final)

Answer 1

经过几次试验，我发现了什么地方出了问题：只需输入：

xgb._best_estimator_.named_steps['xgb'].predict(test_final)

如何使用管道中的最佳估计量来预测测试集？

1 个答案: