AdaBoostRegressor与几个base_estimator

时间:2018-07-02 19:03:08

标签: python-3.x scikit-learn

如何使用几个base_estimator定义AdaBoostRegressor? 我的代码在下面...

# read data and label from TrainFile.
    data,label=file.reade_train_file(rouge,TrainFile)

    tuned_parameters = [{
                        'loss' : ['exponential']
                        ,'random_state' : [47]
                        ,'learning_rate' : [1]
                         }]

    base_models = [ExtraTreesRegressor(n_estimators= 350
                              , criterion= 'mse'
                              ,max_features = 'log2'
                              ,random_state = 40), RandomForestRegressor(n_estimators= 900
                              , criterion= 'mse'
                              ,max_features = 'sqrt'
                              ,min_samples_split = 3
                              ,random_state = 40)]        

    clf = GridSearchCV(AdaBoostRegressor(base_models), tuned_parameters, cv=4)

    clf.fit(data,label)

错误是:

> Traceback (most recent call last):
  File "/home/aliasghar/MySumFarsi/sumFarsi/prjSumFarsi/Documents_References.py", line 956, in <module>
    documents_References.train(1)
  File "/home/aliasghar/MySumFarsi/sumFarsi/prjSumFarsi/Documents_References.py", line 886, in train
    self.get_best_AdaBoostRegressor_for_train(rouge,TrainFile)
  File "/home/aliasghar/MySumFarsi/sumFarsi/prjSumFarsi/Documents_References.py", line 289, in get_best_AdaBoostRegressor_for_train
    clf.fit(data,label)
  File "/usr/local/lib/python3.5/dist-packages/sklearn/model_selection/_search.py", line 638, in fit
    cv.split(X, y, groups)))
  File "/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/parallel.py", line 779, in __call__
    while self.dispatch_one_batch(iterator):
  File "/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/parallel.py", line 625, in dispatch_one_batch
    self._dispatch(tasks)
  File "/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/parallel.py", line 588, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/_parallel_backends.py", line 111, in apply_async
    result = ImmediateResult(func)
  File "/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/_parallel_backends.py", line 332, in __init__
    self.results = batch()
  File "/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/parallel.py", line 131, in __call__
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/parallel.py", line 131, in <listcomp>
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "/usr/local/lib/python3.5/dist-packages/sklearn/model_selection/_validation.py", line 437, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/usr/local/lib/python3.5/dist-packages/sklearn/ensemble/weight_boosting.py", line 960, in fit
    return super(AdaBoostRegressor, self).fit(X, y, sample_weight)
  File "/usr/local/lib/python3.5/dist-packages/sklearn/ensemble/weight_boosting.py", line 145, in fit
    random_state)
  File "/usr/local/lib/python3.5/dist-packages/sklearn/ensemble/weight_boosting.py", line 1006, in _boost
    estimator = self._make_estimator(random_state=random_state)
  File "/usr/local/lib/python3.5/dist-packages/sklearn/ensemble/base.py", line 126, in _make_estimator
    estimator.set_params(**dict((p, getattr(self, p))
AttributeError: 'list' object has no attribute 'set_params'

1 个答案:

答案 0 :(得分:2)

如果我正确理解了您的问题,则希望在AdaBoost上应用GridSearchCV,并提供使用不同基本回归变量的选项。我认为您正在寻找类似的东西

首先,定义您的基本评估者列表

base_models = [ExtraTreesRegressor(n_estimators= 5,
                          criterion= 'mse',
                          max_features = 'log2',
                          random_state = 40),
               RandomForestRegressor(n_estimators= 5,
                               criterion= 'mse',
                               max_features = 'sqrt',
                               min_samples_split = 3,
                               random_state = 40)]  

然后定义要调整的参数,然后将base model添加为单独的参数(还要确保将参数存储在字典中而不是列表中)

tuned_parameters = {    'base_estimator':base_models,
                        'loss' : ['exponential']
                        ,'random_state' : [47]
                        ,'learning_rate' : [1]
                         }

clf = GridSearchCV(AdaBoostRegressor(), tuned_parameters, cv=4)
clf.fit(data,label)

如果您尝试同时使用多个回归器,则按照@Jan K的建议,是不可能的。