如何在scikit-learn中使GradientBoostingRegressor与BaseEstimator一起使用?

时间:2019-01-08 19:47:24

标签: python scikit-learn

Sklearn for gbm支持init参数,该参数提供了训练初始模型并将其通过init参数传递给另一个模型的选项。

我正在尝试使用相同的概念进行回归。下面是我的代码。

gbm_base=GradientBoostingRegressor(random_state=1,verbose=True)
gbm_base.fit(X_train, y_train)
gbm_withEstimator= 
GradientBoostingRegressor(init=gbm_base,random_state=1,verbose=True)
gbm_withEstimator.fit(X_train, y_train)

但这给了我以下错误。

~/anaconda3/lib/python3.6/site-packages/sklearn/ensemble/gradient_boosting.py in
update_terminal_regions(self, tree, X, y, residual, y_pred, sample_weight, sample_mask, learning_rate, k)

499         """
500         # update predictions

--> 501         y_pred[:, k] += learning_rate * tree.predict(X).ravel()

502 
503     def _update_terminal_region(self, tree, terminal_regions, leaf, X, y,

IndexError: too many indices for array

我认为这是错误的,因为在回归中ypred始终是一维数组,但是在这里的代码中它假定它是二维

1 个答案:

答案 0 :(得分:3)

这是一个已知的错误。看看GradientBoosting fails when using init estimator parameter.[MRG] FIX gradient boosting with sklearn estimator as init #12436 了解更多背景信息。

同时,您可以将GradientBoostingRegressor子类化,以避免出现以下问题:

from sklearn.utils import check_array


class GBR_Init(GradientBoostingRegressor):
    def predict(self,X):
        X = check_array(X, dtype=np.float32, order='C', accept_sparse='csr')
        return self._decision_function(X)

然后,您可以使用GBR_Init类代替GradientBoostingRegressor。

一个例子:

import numpy as np
from sklearn.datasets import load_boston
from sklearn.ensemble import GradientBoostingRegressor as GBR
from sklearn.utils import check array

class GBR_Init(GradientBoostingRegressor):
    def predict(self,X):
        X = check_array(X, dtype=np.float32, order='C', accept_sparse='csr')
        return self._decision_function(X)

boston = load_boston()
X = boston.data
y = boston.target
base = GBR_Init(random_state=1, verbose=True)
base.fit(X, y)
      Iter       Train Loss   Remaining Time
         1          71.3024            0.00s
         2          60.6243            0.00s
         3          51.6694            0.00s
         4          44.3657            0.00s
         5          38.2831            0.00s
         6          33.2863            0.00s
         7          28.9190            0.00s
         8          25.2967            0.18s
         9          22.2587            0.16s
        10          19.6923            0.14s
        20           8.3119            0.13s
        30           5.4763            0.07s
        40           4.1906            0.07s
        50           3.4663            0.05s
        60           3.0437            0.04s
        70           2.6753            0.03s
        80           2.4451            0.02s
        90           2.2376            0.01s
       100           2.0142            0.00s
GBR_Init(alpha=0.9, criterion='friedman_mse', init=None, learning_rate=0.1,
     loss='ls', max_depth=3, max_features=None, max_leaf_nodes=None,
     min_impurity_decrease=0.0, min_impurity_split=None,
     min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0,
     n_estimators=100, n_iter_no_change=None, presort='auto',
     random_state=1, subsample=1.0, tol=0.0001, validation_fraction=0.1,
     verbose=True, warm_start=False)
est = GBR_Init(init=base, random_state=1, verbose=True)
est.fit(X, y)
est.fit(X, y)
      Iter       Train Loss   Remaining Time
         1          71.3024            0.00s
         2          60.6243            0.00s
         3          51.6694            0.00s
         4          44.3657            0.00s
         5          38.2831            0.00s
         6          33.2863            0.00s
         7          28.9190            0.00s
         8          25.2967            0.18s
         9          22.2587            0.16s
        10          19.6923            0.14s
        20           8.3119            0.06s
        30           5.4763            0.07s
        40           4.1906            0.05s
        50           3.4663            0.05s
        60           3.0437            0.03s
        70           2.6753            0.03s
        80           2.4451            0.02s
        90           2.2376            0.01s
       100           2.0142            0.00s
      Iter       Train Loss   Remaining Time
         1           2.0069            0.00s
         2           1.9844            0.00s
         3           1.9729            0.00s
         4           1.9670            0.00s
         5           1.9409            0.00s
         6           1.9026            0.00s
         7           1.8850            0.00s
         8           1.8690            0.00s
         9           1.8450            0.00s
        10           1.8391            0.14s
        20           1.6879            0.06s
        30           1.5695            0.04s
        40           1.4469            0.05s
        50           1.3431            0.03s
        60           1.2329            0.03s
        70           1.1370            0.02s
        80           1.0616            0.02s
        90           0.9904            0.01s
       100           0.9228            0.00s
GBR_Init(alpha=0.9, criterion='friedman_mse',
     init=GBR_Init(alpha=0.9, criterion='friedman_mse', init=None, learning_rate
=0.1,
     loss='ls', max_depth=3, max_features=None, max_leaf_nodes=None,
     min_impurity_decrease=0.0, min_impurity_split=None,
     min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0,
     n_estimators=100, n_iter_no_change=None, presort='auto',
     random_state=1, subsample=1.0, tol=0.0001, validation_fraction=0.1,
     verbose=True, warm_start=False),
     learning_rate=0.1, loss='ls', max_depth=3, max_features=None,
     max_leaf_nodes=None, min_impurity_decrease=0.0,
     min_impurity_split=None, min_samples_leaf=1, min_samples_split=2,
     min_weight_fraction_leaf=0.0, n_estimators=100, n_iter_no_change=None,
     presort='auto', random_state=1, subsample=1.0, tol=0.0001,
     validation_fraction=0.1, verbose=True, warm_start=False)