当我在尽早停止运行LGBM时,它给出的分数与其最佳迭代次数相对应。
当我尝试自己复制这些分数时,我得到的数字会不同。
import lightgbm as lgb
from sklearn.datasets import load_breast_cancer
import pandas as pd
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import KFold
data = load_breast_cancer()
X = pd.DataFrame(data.data)
y = pd.Series(data.target)
lgb_params = {'boosting_type': 'dart', 'random_state': 42}
folds = KFold(5)
for train_idx, val_idx in folds.split(X):
X_train, X_valid = X.iloc[train_idx], X.iloc[val_idx]
y_train, y_valid = y.iloc[train_idx], y.iloc[val_idx]
model = lgb.LGBMRegressor(**lgb_params, n_estimators=10000, n_jobs=-1)
model.fit(X_train, y_train,
eval_set=[(X_valid, y_valid)],
eval_metric='mae', verbose=-1, early_stopping_rounds=200)
y_pred_valid = model.predict(X_valid)
print(mean_absolute_error(y_valid, y_pred_valid))
我期待着
valid_0's l1: 0.123608
将与我自己根据mean_absolute_error
进行的计算相匹配,但不会。确实,这是我输出的顶部:
Training until validation scores don't improve for 200 rounds.
Early stopping, best iteration is:
[631] valid_0's l2: 0.0515033 valid_0's l1: 0.123608
0.16287265537021847
我正在使用lightgbm的“ 2.2.1”版本。
答案 0 :(得分:0)
如果更新LGBM版本,您将获得
“用户警告:在飞镖模式下无法提前停止”
请参阅this issue以获得有关它的详细信息。您可以做的是使用最佳的回合轮数重新训练模型。
results = model.evals_result_['valid_0']['l1']
best_perf = min(results)
num_boost = results.index(best_perf)
print('with boost', num_boost, 'perf', best_perf)
model = lgb.LGBMRegressor(**lgb_params, n_estimators=num_boost+1, n_jobs=-1)
model.fit(X_train, y_train, verbose=-1)