我正在尝试对流失预测进行二进制分类。我使用日志丢失作为评估指标。最后一个时期导致训练集和验证集的对数损失约为0.13。但是,当我尝试预测和评估训练集的结果时,报告的训练损失明显更高。验证和测试集报告了相似的结果。
X_train, X_validation, y_train, y_validation = model_selection.train_test_split(X_train, y_train, test_size=0.3, random_state = 111)
d_train = xgb.DMatrix(X_train, y_train)
d_valid = xgb.DMatrix(X_validation, y_validation)
watchlist = [(d_train,'train'), (d_valid,'valid')]
param = {'eta': 0.02,
'max_depth': 7,
'objective': 'binary:logistic',
'eval_metric': 'logloss',
'seed': 100,
'silent': True
}
def xgb_score(preds, dtrain):
labels = dtrain.get_label()
bca = tf.keras.losses.BinaryCrossentropy()
return 'log_loss', bca(labels, preds).numpy()
model = xgb.train(param, d_train, 300, watchlist, feval=xgb_score, maximize=False,
verbose_eval=10, early_stopping_rounds=50)
[0] train-auc:0.916884 valid-auc:0.9153 train-log_loss:0.676236 valid-log_loss:0.676282
Multiple eval metrics have been passed: 'valid-log_loss' will be used for early stopping.
////////////////////////////////////////////////////////////////////////////////////////////
Will train until valid-log_loss hasn't improved in 50 rounds.
[50] train-auc:0.92743 valid-auc:0.925878 train-log_loss:0.273744 valid-log_loss:0.275314
[100] train-auc:0.931298 valid-auc:0.929474 train-log_loss:0.170755 valid-log_loss:0.173269
[150] train-auc:0.936368 valid-auc:0.933754 train-log_loss:0.139556 valid-log_loss:0.143013
[200] train-auc:0.939979 valid-auc:0.936665 train-log_loss:0.129754 valid-log_loss:0.134133
[250] train-auc:0.942428 valid-auc:0.938327 train-log_loss:0.125872 valid-log_loss:0.131139
[299] train-auc:0.944344 valid-auc:0.939238 train-log_loss:0.123966 valid-log_loss:0.13004
evaluation_score(model, X_train, y_train)
AUC : 79.92%
Log loss: 0.6529614925384521