我以下列方式使用xgboost
:
from xgboost import XGBClassifier
clf = XGBClassifier()
clf = clf.fit(df_train, df_train_labels, verbose=True)
这很有效。但是,如果我添加early_stopping_rounds
参数,如下所示:
clf = clf.fit(df_train, df_train_labels, early_stopping_rounds=10, verbose=True)
我收到此错误:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-16-786925228ae5> in <module>()
9
10
---> 11 clf = clf.fit(df_train, df_train_labels, early_stopping_rounds=10, verbose=True)
12 print("after fit")
13 prediction = np.exp(clf.predict(df_test))
~/anaconda3/envs/python3/lib/python3.6/site-packages/xgboost/sklearn.py in fit(self, X, y, sample_weight, eval_set, eval_metric, early_stopping_rounds, verbose)
443 early_stopping_rounds=early_stopping_rounds,
444 evals_result=evals_result, obj=obj, feval=feval,
--> 445 verbose_eval=verbose)
446
447 self.objective = xgb_options["objective"]
~/anaconda3/envs/python3/lib/python3.6/site-packages/xgboost/training.py in train(params, dtrain, num_boost_round, evals, obj, feval, maximize, early_stopping_rounds, evals_result, verbose_eval, learning_rates, xgb_model, callbacks)
203 evals=evals,
204 obj=obj, feval=feval,
--> 205 xgb_model=xgb_model, callbacks=callbacks)
206
207
~/anaconda3/envs/python3/lib/python3.6/site-packages/xgboost/training.py in _train_internal(params, dtrain, num_boost_round, evals, obj, feval, xgb_model, callbacks)
99 end_iteration=num_boost_round,
100 rank=rank,
--> 101 evaluation_result_list=evaluation_result_list))
102 except EarlyStopException:
103 break
~/anaconda3/envs/python3/lib/python3.6/site-packages/xgboost/callback.py in callback(env)
190 def callback(env):
191 """internal function"""
--> 192 score = env.evaluation_result_list[-1][1]
193 if len(state) == 0:
194 init(env)
IndexError: list index out of range
我查了一下,我发现fit
方法可以传递多个参数,所以我不相信我添加early_stopping_rounds
的事实会导致问题。< / p>
知道造成这个错误的原因是什么?
答案 0 :(得分:2)
此错误的原因是,您尚未指定eval_set,xgboost使用该eval_set来确定何时停止进行早期停止。
请参阅文档以了解 fit 方法here。
eval_set(列表,可选)–(X,y)个元组对的列表,用作早期停止的验证集
例如,如果您将数据分为训练和测试集,则可以使用以下方法:
eval_set = [(X_test, y_test)]
clf = clf.fit(df_train,
df_train_labels,
eval_set=eval_set,
early_stopping_rounds=10,
verbose=True)
答案 1 :(得分:0)
Chris 给出的一个原因是,当我们想要找到 early_stopping_rounds
的正确值时使用 n_estimators
(这只是建模循环发生的次数)。您尚未在模型中指定 n_estimators
参数。您应该始终在 XGB 模型中指定这两个参数。