验证数据的WOE转换

时间:2019-02-21 09:16:03

标签: r logistic-regression

我正在进行逻辑回归和XGBoost,并将所有变量都转换为WOE。 这是针对训练数据集完成的。 现在,我想根据验证和样本外测试数据验证模型。 WOE是通过使用Hmisc::CUT2函数生成的,然后应用InformationValue::WOE

data.work$MAILING_DAYS <- cut2(training$MAILING_DAYS, g=20, cuts=c(16,23,27))

data.work.woe$MAILING_DAYS <-  WOE(data.work$MAILING_DAYS,
                                   data.work$SUCCESS,
                                   valueOfGood=1)

可能的信息是:

 WOETable(data.work$MAILING_DAYS,
     data.work$SUCCESS,
     valueOfGood=1)
      CAT GOODS  BADS TOTAL      PCT_G     PCT_B         WOE          IV
1 [ 0,16)  4827 89389 94216 0.58844325 0.4983581  0.16616157 0.014968688
2 [16,23)  1750 41383 43133 0.21333658 0.2307169 -0.07832034 0.001361233
3 [23,27)   987 27323 28310 0.12032183 0.1523301 -0.23588003 0.007550120
4 [27,30]   639 21272 21911 0.07789833 0.1185948 -0.42030843 0.017105085

levels(data.work$MAILING_DAYS)
[1] "[ 0,16)" "[16,23)" "[23,27)" "[27,30]"

我尝试了类似的方法:

WOE <- data.frame(NAME=character(), 
                  COND=character(),
                  VALUE=integer(),
                  WOE =integer(),
                  stringsAsFactors=FALSE)
a = names(data.work)
WOE.CAT <- c()
WOE.WOE <- c()
k <- 1

for (i in c(4:4)){
  temp.var <- a[i]
  WOE.CAT <- WOETable(data.work[, temp.var], data.work$SUCCESS, valueOfGood = 1)$CAT
  WOE.WOE <- WOETable(data.work[, temp.var], data.work$SUCCESS, valueOfGood = 1)$WOE
  for (j in c(2:length(WOE.CAT))){
    if (as.integer(gregexpr(pattern=",", WOE.CAT[j]) == -1)){ 
      WOE[k,"NAME"]  <- temp.var
      WOE[k,"COND"]  <- "<"
      WOE[k,"VALUE"] <- as.numeric(WOE.CAT[j+1])
      WOE[k, "WOE"]  <- WOE.WOE[j]
      k <- k + 1     
    } else if (as.integer(gregexpr(pattern=",", WOE.CAT[j]) != -1)){ 
      if (j < (length(WOE.CAT)-1)){
        WOE[k,"NAME"]  <- temp.var
        WOE[k,"COND"]  <- "<"
        WOE[k,"VALUE"] <- as.numeric(substr(WOE.CAT[j], (as.integer(gregexpr(pattern=",", WOE.CAT[j]))+1), (nchar(WOE.CAT[j])-1)))
        WOE[k, "WOE"]  <- WOE.WOE[j]
        k <- k + 1
      } else if(j == (length(WOE.CAT)-1)){
        WOE[k,"NAME"]  <- temp.var
        WOE[k,"COND"]  <- ">="
        WOE[k,"VALUE"] <- as.numeric(substr(WOE.CAT[j], 2, (as.integer(gregexpr(pattern=",", WOE.CAT[j]))-1)))
        WOE[k, "WOE"]  <- WOE.WOE[j]
        k <- k + 1
      } 
    } else if (WOE.CAT[j] == "missing"){
      WOE[k,"NAME"]  <- temp.var
      WOE[k,"COND"]  <- "=="
      WOE[k,"VALUE"] <- NA
      WOE[k, "WOE"]  <- WOE.WOE[j]
      k <- k + 1
    }
  }
}

应该有一种将WOE从训练数据转换为验证数据的方法,是吗?

愚蠢的方法是if else if ...但是我有250多个特征,所以这将花费很多时间!

非常感谢您的帮助

1 个答案:

答案 0 :(得分:0)

最后我有一个解决方案,

使用软件包:记分卡非常有用。在我的情况下,我使用了Scorecard :: woebin中确定的breaks breaks_list,并使用Scorecard :: woebin_ply创建数据