我正在使用sklearn的GradientBoostingRegression方法。因此,在拟合了2000个估算器之后,我想为它添加更多的估算器。由于重新运行整个拟合过程需要很长时间,因此我使用了set_params()方法。请注意,这是一个多目标问题,这意味着我有3个目标可以适应。所以我使用以下代码添加更多估算器。
'''parameters: models (list of length 3 in our case )
train_X, train_y [n_samples x 3], test
n_estimators : previous + 500 (default) [additional estimators]
warm_start : True (default)
'''
def addMoreEstimators(train_X, train_y, test, models, n_estimators = 500, warm_start=True):
params = {'n_estimators':n_estimators, 'warm_start':warm_start}
gbm_pred= pd.DataFrame()
for (i,stars),clf in zip(enumerate(['*','**','***']), models):
clf.set_params(**params)
%time clf.fit(train_X.todense(),train_y[stars])
%time gbm_pred[stars] = clf.predict(test.todense())
gbm_pred = gbm_pred.as_matrix()
gbm_dict ={'model': gbm, 'prediction': gbm_pred}
return gbm_dict
注意:models
参数是3个目标的3个拟合模型的列表。
当我第一次使用2500(最初我有2000个估算器)运行它时,它运行正常,并给了我一个输出。
当我使用3000个估算器运行相同的函数时,我得到一个AttributeError
(参见下面错误的回溯)。这里的模型包含3个拟合模型。下面是错误的追溯:(它有点长)
AttributeError Traceback (most recent call last)
<ipython-input-104-9418ada3b36f> in <module>()
7 test = val_X_tfidf[:,shortened_col_index],
8 models = models,
----> 9 n_estimators = 3000)
10
11 reduced_features_gbm_pred_3000_2_lr_1_msp_2 = reduced_features_gbm_model_3000_2_lr_1_msp_2['prediction']
<ipython-input-103-e15a4fb70b50> in addMoreEstimators(train_X, train_y, test, models, n_estimators, warm_start)
15
16 clf.set_params(**params)
---> 17 get_ipython().magic(u'time clf.fit(train_X.todense(),train_y[stars])')
18 print 'starting prediction'
19
//anaconda/lib/python2.7/site-packages/IPython/core/interactiveshell.pyc in magic(self, arg_s)
2305 magic_name, _, magic_arg_s = arg_s.partition(' ')
2306 magic_name = magic_name.lstrip(prefilter.ESC_MAGIC)
-> 2307 return self.run_line_magic(magic_name, magic_arg_s)
2308
2309 #-------------------------------------------------------------------------
//anaconda/lib/python2.7/site-packages/IPython/core/interactiveshell.pyc in run_line_magic(self, magic_name, line)
2226 kwargs['local_ns'] = sys._getframe(stack_depth).f_locals
2227 with self.builtin_trap:
-> 2228 result = fn(*args,**kwargs)
2229 return result
2230
//anaconda/lib/python2.7/site-packages/IPython/core/magics/execution.pyc in time(self, line, cell, local_ns)
//anaconda/lib/python2.7/site-packages/IPython/core/magic.pyc in <lambda>(f, *a, **k)
191 # but it's overkill for just that one bit of state.
192 def magic_deco(arg):
--> 193 call = lambda f, *a, **k: f(*a, **k)
194
195 if callable(arg):
//anaconda/lib/python2.7/site-packages/IPython/core/magics/execution.pyc in time(self, line, cell, local_ns)
1160 if mode=='eval':
1161 st = clock2()
-> 1162 out = eval(code, glob, local_ns)
1163 end = clock2()
1164 else:
<timed eval> in <module>()
//anaconda/lib/python2.7/site-packages/sklearn/ensemble/gradient_boosting.pyc in fit(self, X, y, sample_weight, monitor)
973 self.estimators_.shape[0]))
974 begin_at_stage = self.estimators_.shape[0]
--> 975 y_pred = self._decision_function(X)
976 self._resize_state()
977
//anaconda/lib/python2.7/site-packages/sklearn/ensemble/gradient_boosting.pyc in _decision_function(self, X)
1080 # not doing input validation.
1081 score = self._init_decision_function(X)
-> 1082 predict_stages(self.estimators_, X, self.learning_rate, score)
1083 return score
1084
sklearn/ensemble/_gradient_boosting.pyx in sklearn.ensemble._gradient_boosting.predict_stages (sklearn/ensemble/_gradient_boosting.c:2502)()
AttributeError: 'int' object has no attribute 'tree_'
对于漫长的追溯感到抱歉,但我认为无法向我提供有意义的反馈。
再次,为什么我会收到这些反馈?
非常感谢任何帮助。
由于
修改
下面是生成models
的代码,该代码是上述函数中的输入之一。
from sklearn import ensemble
def updated_runGBM(train_X, train_y, test,
n_estimators =100,
max_depth = 1,
min_samples_split=1,
learning_rate=0.01,
loss= 'ls',
warm_start=True):
'''train_X : n_samples x m_features
train_y : n_samples x k_targets (multiple targets allowed)
test : n_samples x m_features
warm_start : True (originally the default is False, but I want to add trees)
'''
params = {'n_estimators': n_estimators, 'max_depth': max_depth, 'min_samples_split': min_samples_split,
'learning_rate': learning_rate, 'loss': loss,'warm_start':warm_start}
gbm1 = ensemble.GradientBoostingRegressor(**params)
gbm2 = ensemble.GradientBoostingRegressor(**params)
gbm3 = ensemble.GradientBoostingRegressor(**params)
gbm = [gbm1,gbm2,gbm3]
gbm_pred= pd.DataFrame()
for (i,stars),clf in zip(enumerate(['*','**','***']), gbm):
%time clf.fit(train_X.todense(),train_y[stars])
%time gbm_pred[stars] = clf.predict(test.todense())
gbm_pred = gbm_pred.as_matrix()
gbm_pred = np.clip(gbm_pred,0,np.inf)
gbm_dict ={'model': gbm, 'prediction': gbm_pred}
return gbm_dict
注意在上面的代码中,我删除了一些print语句以减少混乱。
这些是我正在使用的两个函数,没有别的(除了分割数据的代码)。