我正在进行逻辑回归和XGBoost,并将所有变量都转换为WOE
。
这是针对训练数据集完成的。
现在,我想根据验证和样本外测试数据验证模型。
WOE
是通过使用Hmisc::CUT2
函数生成的,然后应用InformationValue::WOE
data.work$MAILING_DAYS <- cut2(training$MAILING_DAYS, g=20, cuts=c(16,23,27))
data.work.woe$MAILING_DAYS <- WOE(data.work$MAILING_DAYS,
data.work$SUCCESS,
valueOfGood=1)
可能的信息是:
WOETable(data.work$MAILING_DAYS,
data.work$SUCCESS,
valueOfGood=1)
CAT GOODS BADS TOTAL PCT_G PCT_B WOE IV
1 [ 0,16) 4827 89389 94216 0.58844325 0.4983581 0.16616157 0.014968688
2 [16,23) 1750 41383 43133 0.21333658 0.2307169 -0.07832034 0.001361233
3 [23,27) 987 27323 28310 0.12032183 0.1523301 -0.23588003 0.007550120
4 [27,30] 639 21272 21911 0.07789833 0.1185948 -0.42030843 0.017105085
levels(data.work$MAILING_DAYS)
[1] "[ 0,16)" "[16,23)" "[23,27)" "[27,30]"
我尝试了类似的方法:
WOE <- data.frame(NAME=character(),
COND=character(),
VALUE=integer(),
WOE =integer(),
stringsAsFactors=FALSE)
a = names(data.work)
WOE.CAT <- c()
WOE.WOE <- c()
k <- 1
for (i in c(4:4)){
temp.var <- a[i]
WOE.CAT <- WOETable(data.work[, temp.var], data.work$SUCCESS, valueOfGood = 1)$CAT
WOE.WOE <- WOETable(data.work[, temp.var], data.work$SUCCESS, valueOfGood = 1)$WOE
for (j in c(2:length(WOE.CAT))){
if (as.integer(gregexpr(pattern=",", WOE.CAT[j]) == -1)){
WOE[k,"NAME"] <- temp.var
WOE[k,"COND"] <- "<"
WOE[k,"VALUE"] <- as.numeric(WOE.CAT[j+1])
WOE[k, "WOE"] <- WOE.WOE[j]
k <- k + 1
} else if (as.integer(gregexpr(pattern=",", WOE.CAT[j]) != -1)){
if (j < (length(WOE.CAT)-1)){
WOE[k,"NAME"] <- temp.var
WOE[k,"COND"] <- "<"
WOE[k,"VALUE"] <- as.numeric(substr(WOE.CAT[j], (as.integer(gregexpr(pattern=",", WOE.CAT[j]))+1), (nchar(WOE.CAT[j])-1)))
WOE[k, "WOE"] <- WOE.WOE[j]
k <- k + 1
} else if(j == (length(WOE.CAT)-1)){
WOE[k,"NAME"] <- temp.var
WOE[k,"COND"] <- ">="
WOE[k,"VALUE"] <- as.numeric(substr(WOE.CAT[j], 2, (as.integer(gregexpr(pattern=",", WOE.CAT[j]))-1)))
WOE[k, "WOE"] <- WOE.WOE[j]
k <- k + 1
}
} else if (WOE.CAT[j] == "missing"){
WOE[k,"NAME"] <- temp.var
WOE[k,"COND"] <- "=="
WOE[k,"VALUE"] <- NA
WOE[k, "WOE"] <- WOE.WOE[j]
k <- k + 1
}
}
}
应该有一种将WOE从训练数据转换为验证数据的方法,是吗?
愚蠢的方法是if else if
...但是我有250多个特征,所以这将花费很多时间!
非常感谢您的帮助
答案 0 :(得分:0)
最后我有一个解决方案,
使用软件包:记分卡非常有用。在我的情况下,我使用了Scorecard :: woebin中确定的breaks breaks_list,并使用Scorecard :: woebin_ply创建数据