当操作符为字符串时,R中的Vlookup

时间:2016-09-06 21:19:16

标签: r

我的数据框看起来像这样,它有操作符和WoE值的cutpoint值:

Cutpoint <- c("<= 0","<= 2","<= 4.5","<= 8","> 8","Missing")
WoE <- c("0.12","0.24","0.45","0.55","0.92","0.99")
dictionary <- data.frame(Cutpoint,WoE)

  Cutpoint  WoE
1     <= 0 0.12
2     <= 2 0.24
3   <= 4.5 0.45
4     <= 8 0.55
5      > 8 0.92
6  Missing 0.99

另一张表看起来像这样

val_A <- c("a","a","b","b","c","c","c","d")
val_B <- c("6","-1","3",NA,"7","8",NA,"9")
table <- data.frame(val_A,val_B)

  val_A val_B
1     a     6
2     a    -1
3     b     3
4     b  <NA>
5     c     7
6     c     8
7     c  <NA>
8     d     9

我想要实现的是在我的字典中查看val_B值,以便我的结果表如下:

  val_A table_B
1     a    0.55
2     a    0.12
3     b    0.45
4     b    0.99
5     c    0.55
6     c    0.55
7     c    0.99
8     d    0.92

非常感谢任何暗示。

1 个答案:

答案 0 :(得分:1)

这个可以完成,最容易通过剥离赋值运算符并使用除最后2个分割点以外的所有分支都是“&lt; =”的信息。

设置数据:

Cutpoint <- c("<= 0","<= 2","<= 4.5","<= 8","> 8","Missing")
WoE <- c("0.12","0.24","0.45","0.55","0.92","0.99")
## stringsAsFactors=FALSE is *essential* here -- or
##  use options(stringsAsFactors=FALSE) to set globally
dictionary <- data.frame(Cutpoint,WoE,stringsAsFactors=FALSE)

val_A <- c("a","a","b","b","c","c","c","d")
val_B <- c("6","-1","3",NA,"7","8",NA,"9")
table <- data.frame(val_A,val_B,stringsAsFactors=FALSE)

剥离比较运算符并将切割点强制转换为数字:

cuts <- as.numeric(gsub("(<=|>)","",dictionary$Cutpoint))

设置断点/分界点的向量:

cuts2 <- c(-Inf,head(cuts,-2),Inf) ## all but last 2 vals of 'cuts', + Inf

查找数字类别:

cc <- cut(as.numeric(table$val_B),breaks=cuts2)

替换NA值的最后一个类别:

cc2 <- replace(as.numeric(cc),is.na(cc),nrow(dictionary))

现在进行查找:

data.frame(val_A,table_B=as.numeric(WoE)[cc2])