按行比较两个data.tables并添加新列

时间:2019-07-24 06:54:09

标签: r data.table

我有两个data.table,我想对每一行进行比较并添加新列。

DT1 <- data.table(ID=c("F","A","E","B","C","D","C"),
                  num=c(59,3,108,11,22,54,241),
                  value=c(90,47,189,72,42,86,280))

DT2 <- data.table(Mark=c("Mary","Abner","Bonnie","Trista","Norman"),
                  numA=c(48,20,88,237,10),
                  numB=c(60,326,54,268,89),
                  valueA=c(78,34,78,270,60),
                  valueB=c(92,190,90,385,75))

我的目标:

我想在DT1中找到num和value,并且在DT2中有numA和numB范围。

例如:

对于F行DT1中的num = 59和value = 90的行,还必须匹配:

num(59)> DT2 $ numA(48)和num(59) DT2 $ valueA(78)和值(90)

匹配!因此添加新的列名结果,其值为dt2标记

如果没有匹配项,请将其设置为“未定义”

所需结果:

DT3 <- data.table(ID=c("F","A","E","B","C","D","C"),
              num=c(59,3,108,11,22,54,241),
              value=c(90,47,189,38,42,86,280),
              result=c("Mary","Undefined","Abner","Norman",
                       "Abner","Abner","Trista"))

如何确保每一行都有比较并添加新列?

2 个答案:

答案 0 :(得分:7)

data.table选项:

DT1[DT2, on=.(num > numA, num < numB, value > valueA, value < valueB), Mark := i.Mark]

 DT1
   ID num value   Mark
1:  F  59    90  Abner
2:  A   3    47   <NA>
3:  E 108   189  Abner
4:  B  11    72 Norman
5:  C  22    42  Abner
6:  D  54    86  Abner
7:  C 241   280 Trista

答案 1 :(得分:2)

我确信可以使用data.table中的join操作之一来更有效地解决此问题,但是,这是一个使用mapply的基本R选项

DT1$result <- mapply(function(x, y) {
   inds <- x > DT2$numA & x < DT2$numB & y > DT2$valueA & x < DT2$valueB
   if(any(inds))
     DT2$Mark[which.max(inds)]
   else "Undefined"
}, DT1$num, DT1$value)


DT1
#   ID num value    result
#1:  F  59    90      Mary
#2:  A   3    47 Undefined
#3:  E 108   189     Abner
#4:  B  11    72    Norman
#5:  C  22    42     Abner
#6:  D  54    86      Mary
#7:  C 241   280    Trista