我用xgboost拟合了一个模型,其AUC在0.73左右,并且打印了我的最后一个增强器:
booster[599]:
0:[userkn_hometypecnt<22] yes=1,no=2,missing=1
1:[userkn_60d_opencardniu_days<40] yes=3,no=4,missing=3
3:[userkn_30d_opencardniu_days<13] yes=7,no=8,missing=7
7:[userkn_60d_opencardniu_days<24] yes=15,no=16,missing=15
15:[userkn_timeminperiod_firstday<1029] yes=29,no=30,missing=29
29:leaf=0.000352735
30:leaf=-0.0100666
16:[userkn_rate_aopencardniusum_actiondaycnt<0.972506] yes=31,no=32,missing=31
31:leaf=0.000398097
32:leaf=-0.0129448
8:[userkn_hometyperate<0.0977183] yes=17,no=18,missing=17
17:leaf=0.0239075
18:[userkn_rate_aopencardniusum_actiondaycnt<0.957994] yes=35,no=36,missing=35
35:leaf=-0.00201536
36:leaf=0.00858442
4:[userkn_newacitoncntactiondayavg<8.82511] yes=9,no=10,missing=9
9:[userkn_mingap_importcard_open<297306] yes=19,no=20,missing=19
19:[userkn_rate_aopencardniusum_actiondaycnt<0.974763] yes=37,no=38,missing=37
37:leaf=-0.0138254
38:leaf=0.00521038
20:[userkn_onlinetime_firstday<1961.5] yes=39,no=40,missing=39
39:leaf=0.0247849
40:leaf=-0.00297016
10:[userkn_60d_opencardniu_days<59] yes=21,no=22,missing=21
21:[userkn_rate_repeatcntmaxactionrepeatcnt_actioncnt<0.124787] yes=41,no=42,missing=41
41:leaf=0.0101992
42:leaf=-0.0222082
22:leaf=0.0145614
2:[userkn_hometyperate_firstday<0.25266] yes=5,no=6,missing=5
5:[userkn_aenterapplyloanpagecntactiondayavg<0.787338] yes=11,no=12,missing=11
11:[userkn_newacitoncntactiondayavg<8.48678] yes=23,no=24,missing=23
23:[userkn_worktimeactionrate<0.36514] yes=43,no=44,missing=43
43:leaf=-0.0178327
44:leaf=0.0168168
24:leaf=0.0254048
12:[userkn_newacitontyperate_firstday<0.794737] yes=25,no=26,missing=25
25:[userkn_newacitoncntactiondayavg<7.14581] yes=47,no=48,missing=47
47:leaf=0.0175715
48:leaf=-0.00748876
26:leaf=0.0174804
6:[userkn_aopencardniurate_firstday<0.0458042] yes=13,no=14,missing=13
13:[userkn_avgperday_opencardniu_cnt<7.44167] yes=27,no=28,missing=27
27:leaf=0.00171541
28:leaf=-0.0229204
14:leaf=0.00968641
如果我是对的话,叶子的值是logodds的值,可以通过S型函数将其更改为概率。但是在最后一次增强中,所有叶子的值更改为0.5左右的概率,这意味着所有样本都将被标记好的和坏的情况是一半还是一半?所以对二进制分类的随机猜测没有区别。 我是对的还是其他任何观点都受到赞赏!