我正在做一个二进制分类问题,我建立了一个模型,并且我得到了像这样的准确性0。0.85391079298,但是当我提交我的提交数据时,我得到了像这样的73.55651分数,评估矩阵为:100 *(准确性(实际得分,预测得分))。我认为我的模型过拟合,请帮助我如何解决这个问题。 这是我的代码:
errlgb=[]
flod= StratifiedKFold(n_splits=5,shuffle=True, random_state=123)
i=1
for train_index, test_index in fold.split(X,y):
X_train, X_test = X.iloc[train_index], X.iloc[test_index]
y_train, y_test = y[train_index], y[test_index]
clf = LGBMClassifier(
n_estimators=3000,
learning_rate=0.18,
num_leaves=200,
colsample_bytree=.8,
subsample=.9,
reg_alpha=.1,
reg_lambda=.1,
min_split_gain=.01,
min_child_weight=2
)
clf.fit(X_train,y_train,eval_set=[(X_train,y_train),(X_test, y_test)], early_stopping_rounds=100,verbose=200)
preds=m.predict_proba(X_test)[:,-1]
print("err_xgb: ",roc_auc_score(y_test,preds))
errlgb.append(roc_auc_score(y_test,preds))
p = clf.predict(test_df)
这是我的输出:
Training until validation scores don't improve for 100 rounds.
Early stopping, best iteration is:
[31] training's binary_logloss: 0.336613 valid_1's binary_logloss: 0.357114
err_lgb: 0.8699739045694661
Training until validation scores don't improve for 100 rounds.
Early stopping, best iteration is:
[26] training's binary_logloss: 0.337955 valid_1's binary_logloss: 0.358127
err_lgb: 0.8765486027393695
Training until validation scores don't improve for 100 rounds.
Early stopping, best iteration is:
[26] training's binary_logloss: 0.338397 valid_1's binary_logloss: 0.360132
err_lgb: 0.8732626771894375
Training until validation scores don't improve for 100 rounds.
Early stopping, best iteration is:
[29] training's binary_logloss: 0.336385 valid_1's binary_logloss: 0.35572
err_lgb: 0.8735276671812789
Training until validation scores don't improve for 100 rounds.
Early stopping, best iteration is:
[30] training's binary_logloss: 0.33504 valid_1's binary_logloss: 0.364632
err_lgb: 0.8684728203461445
Training until validation scores don't improve for 100 rounds.
Early stopping, best iteration is:
[30] training's binary_logloss: 0.33692 valid_1's binary_logloss: 0.340379
err_lgb: 0.8879300398147396
Training until validation scores don't improve for 100 rounds.
Early stopping, best iteration is:
[29] training's binary_logloss: 0.336804 valid_1's binary_logloss: 0.359437
err_lgb: 0.8743308686113594
Training until validation scores don't improve for 100 rounds.
Early stopping, best iteration is:
[29] training's binary_logloss: 0.337057 valid_1's binary_logloss: 0.357902
err_lgb: 0.8660940927927196
Training until validation scores don't improve for 100 rounds.
Early stopping, best iteration is:
[26] training's binary_logloss: 0.338251 valid_1's binary_logloss: 0.375692
err_lgb: 0.8508396983840842
Training until validation scores don't improve for 100 rounds.
Early stopping, best iteration is:
[21] training's binary_logloss: 0.338989 valid_1's binary_logloss: 0.379214
err_lgb: 0.8539107929829131