我已经尝试了大约10种型号。我调整了最好的3。但是,我仍然对错误术语不满意,因此我正在尝试集成(确切地说是堆叠),或者如果您认为合适的话。我很乐意提出建议。
我正在尝试堆叠。我试图弄清楚什么是最好的元估算器。基础学习者是Extra Trees Regressor,XGBoost,Random Forest。
我尝试了Voting Regressor,Bagging Regressor并拥有Stacking类(尽管我在网上找到了代码,并且了解了代码的要旨)。
我调整基本模型。但是,当我将“梯度增强”作为元估计量进行堆叠时。实际上,我得到的结果比表现最好的结果还要差):
class StackingAveragedModels(BaseEstimator, RegressorMixin, TransformerMixin):
def __init__(self, base_models, meta_model, n_folds=5):
self.base_models = base_models
self.meta_model = meta_model
self.n_folds = n_folds
def fit(self, X, y):
self.base_models_ = [list() for x in self.base_models]
self.meta_model_ = clone(self.meta_model)
kfold = KFold(n_splits=self.n_folds, shuffle=True)
out_of_fold_predictions = np.zeros((X.shape[0], len(self.base_models)))
for i, clf in enumerate(self.base_models):
for train_index, holdout_index in kfold.split(X, y):
instance = clone(clf)
self.base_models_[i].append(instance)
instance.fit(X[train_index], y[train_index])
y_pred = instance.predict(X[holdout_index])
out_of_fold_predictions[holdout_index, i] = y_pred
self.meta_model_.fit(out_of_fold_predictions, y)
return self
def predict(self, X):
meta_features = np.column_stack([
np.column_stack([model.predict(X) for model in base_models]).mean(axis=1)
for base_models in self.base_models_ ])
return self.meta_model_.predict(meta_features)
stacked_average_models = StackingAveragedModels(base_models = (a, pipe_rs_xgb, KRR), meta_model = GradientBoostingRegressor(n_estimators = 50))
stacked_average_models.fit(x_train, y_train)
堆叠模型实际上比我预期的要差。我还是个初学者,所以任何方向都可以指引我正确的方向!