我的数据框看起来像这样,它有操作符和WoE值的cutpoint值:
Cutpoint <- c("<= 0","<= 2","<= 4.5","<= 8","> 8","Missing")
WoE <- c("0.12","0.24","0.45","0.55","0.92","0.99")
dictionary <- data.frame(Cutpoint,WoE)
Cutpoint WoE
1 <= 0 0.12
2 <= 2 0.24
3 <= 4.5 0.45
4 <= 8 0.55
5 > 8 0.92
6 Missing 0.99
另一张表看起来像这样
val_A <- c("a","a","b","b","c","c","c","d")
val_B <- c("6","-1","3",NA,"7","8",NA,"9")
table <- data.frame(val_A,val_B)
val_A val_B
1 a 6
2 a -1
3 b 3
4 b <NA>
5 c 7
6 c 8
7 c <NA>
8 d 9
我想要实现的是在我的字典中查看val_B值,以便我的结果表如下:
val_A table_B
1 a 0.55
2 a 0.12
3 b 0.45
4 b 0.99
5 c 0.55
6 c 0.55
7 c 0.99
8 d 0.92
非常感谢任何暗示。
答案 0 :(得分:1)
这个可以完成,最容易通过剥离赋值运算符并使用除最后2个分割点以外的所有分支都是“&lt; =”的信息。
设置数据:
Cutpoint <- c("<= 0","<= 2","<= 4.5","<= 8","> 8","Missing")
WoE <- c("0.12","0.24","0.45","0.55","0.92","0.99")
## stringsAsFactors=FALSE is *essential* here -- or
## use options(stringsAsFactors=FALSE) to set globally
dictionary <- data.frame(Cutpoint,WoE,stringsAsFactors=FALSE)
val_A <- c("a","a","b","b","c","c","c","d")
val_B <- c("6","-1","3",NA,"7","8",NA,"9")
table <- data.frame(val_A,val_B,stringsAsFactors=FALSE)
剥离比较运算符并将切割点强制转换为数字:
cuts <- as.numeric(gsub("(<=|>)","",dictionary$Cutpoint))
设置断点/分界点的向量:
cuts2 <- c(-Inf,head(cuts,-2),Inf) ## all but last 2 vals of 'cuts', + Inf
查找数字类别:
cc <- cut(as.numeric(table$val_B),breaks=cuts2)
替换NA
值的最后一个类别:
cc2 <- replace(as.numeric(cc),is.na(cc),nrow(dictionary))
现在进行查找:
data.frame(val_A,table_B=as.numeric(WoE)[cc2])