我目前正致力于一个旨在预测二元类的机器学习项目(负面:0,正面:1)。数据集不平衡。正值的比例为0.1%。
我正在使用gini作为我的性能指标运行xgboost模型。 问题是在增强迭代期间需要大量运行才能提高分数
例如:
[Fold 1/2]
[0] train-gini:-0.048192 validation-gini:-0.042979
Multiple eval metrics have been passed: 'validation-gini' will be used for early stopping.
Will train until validation-gini hasn't improved in 200 rounds.
[10] train-gini:-0.048192 validation-gini:-0.042979
[20] train-gini:-0.048192 validation-gini:-0.042979
[30] train-gini:-0.048192 validation-gini:-0.042979
[40] train-gini:-0.048192 validation-gini:-0.042979
[50] train-gini:-0.048192 validation-gini:-0.042979
[60] train-gini:-0.048192 validation-gini:-0.042979
[70] train-gini:-0.048192 validation-gini:-0.042979
[80] train-gini:-0.048192 validation-gini:-0.042979
[90] train-gini:0.197521 validation-gini:0.114222
[100] train-gini:0.247692 validation-gini:0.150601
[110] train-gini:0.2742 validation-gini:0.169023
[120] train-gini:0.278983 validation-gini:0.168095
[130] train-gini:0.316636 validation-gini:0.19118
[140] train-gini:0.347296 validation-gini:0.191045
[150] train-gini:0.368581 validation-gini:0.20094
[160] train-gini:0.374773 validation-gini:0.20906
[170] train-gini:0.398815 validation-gini:0.215193
[180] train-gini:0.426088 validation-gini:0.220467
[190] train-gini:0.439271 validation-gini:0.22249
[200] train-gini:0.455897 validation-gini:0.226621
[210] train-gini:0.469989 validation-gini:0.229512
[220] train-gini:0.485784 validation-gini:0.233432
[230] train-gini:0.496734 validation-gini:0.23747
[240] train-gini:0.503718 validation-gini:0.241804
[250] train-gini:0.51102 validation-gini:0.241841
[260] train-gini:0.523444 validation-gini:0.244312
[270] train-gini:0.530968 validation-gini:0.245467
[280] train-gini:0.538703 validation-gini:0.247433
[290] train-gini:0.546911 validation-gini:0.244196
[300] train-gini:0.553623 validation-gini:0.244161
[310] train-gini:0.561385 validation-gini:0.245099
[320] train-gini:0.571532 validation-gini:0.244787
[330] train-gini:0.578088 validation-gini:0.246146
[340] train-gini:0.585054 validation-gini:0.245624
[350] train-gini:0.591924 validation-gini:0.245463
[360] train-gini:0.596331 validation-gini:0.247517
[370] train-gini:0.600661 validation-gini:0.249465
[380] train-gini:0.606264 validation-gini:0.249034
[390] train-gini:0.611768 validation-gini:0.249182
[400] train-gini:0.617176 validation-gini:0.248239
[410] train-gini:0.621629 validation-gini:0.249248
[420] train-gini:0.626766 validation-gini:0.24975
[430] train-gini:0.631587 validation-gini:0.247824
[440] train-gini:0.636737 validation-gini:0.246586
[450] train-gini:0.641735 validation-gini:0.246552
[460] train-gini:0.649765 validation-gini:0.246332
[470] train-gini:0.654319 validation-gini:0.243546
[480] train-gini:0.659301 validation-gini:0.241965
[490] train-gini:0.665632 validation-gini:0.242562
[500] train-gini:0.669333 validation-gini:0.241306
[510] train-gini:0.673625 validation-gini:0.240314
[520] train-gini:0.678935 validation-gini:0.239846
[530] train-gini:0.683851 validation-gini:0.240029
[540] train-gini:0.685694 validation-gini:0.240691
[550] train-gini:0.689285 validation-gini:0.239974
[560] train-gini:0.691698 validation-gini:0.239079
[570] train-gini:0.694017 validation-gini:0.239407
Stopping. Best iteration:
[373] train-gini:0.60227 validation-gini:0.24996
我们可以看到,在第80轮,火车和验证的分数最终得到改善。即使我改变了分裂的种子(但是分数的n°会改变分数),这种情况也会重复。
有人遇到过这种问题吗?
干杯, ASTRUS
答案 0 :(得分:0)
不。但只有0.1%的正值,您可能想尝试xgboost参数的scale_pos_weight : float
值
也许它会解决这个问题。我会选择:
scale_pos_weight = 1000
答案 1 :(得分:0)
您是否尝试按照xgboost documentation将eval_metric
更改为logloss
或error
?