约束证据权重的R中的分箱

时间:2019-05-09 07:49:14

标签: r binning

在以下来自“记分卡”软件包文档的示例中,所有变量都进行了装箱。但是,如果我查看针对“ age.in.years”的建议分类,则默认比率作为年龄的函数遵循过山车模式(您可以查看图表或查看“ badprob”列)。我们是否可以为随着年龄增加而降低违约率(证据权重增加)施加条件,从而使分级的信息价值最大化?有什么想法吗?

非常感谢

    library(scorecard)
    # data preparing ------
    # load germancredit data
    data("germancredit")
    # filter variable via missing rate, iv, identical value rate
    dt_f = var_filter(germancredit, y="creditability")
    # breaking dt into train and test
    dt_list = split_df(dt_f, y="creditability", ratio = 0.6, seed = 30)
    label_list = lapply(dt_list, function(x) x$creditability)
    # woe binning ------
    bins = woebin(dt_f, y="creditability")
    > bins$age.in.years
       variable       bin count count_distr good bad   badprob        woe
1: age.in.years [-Inf,26)   190       0.190  110  80 0.4210526  0.5288441
2: age.in.years   [26,28)   101       0.101   74  27 0.2673267 -0.1609304
3: age.in.years   [28,35)   257       0.257  172  85 0.3307393  0.1424546
4: age.in.years   [35,37)    79       0.079   67  12 0.1518987 -0.8724881
5: age.in.years [37, Inf)   373       0.373  277  96 0.2573727 -0.2123715
        bin_iv  total_iv breaks is_special_values
1: 0.057921024 0.1304985     26             FALSE
2: 0.002528906 0.1304985     28             FALSE
3: 0.005359008 0.1304985     35             FALSE
4: 0.048610052 0.1304985     37             FALSE
5: 0.016079553 0.1304985    Inf             FALSE

0 个答案:

没有答案