我有两个data.table,我想对每一行进行比较并添加新列。
DT1 <- data.table(ID=c("F","A","E","B","C","D","C"),
num=c(59,3,108,11,22,54,241),
value=c(90,47,189,72,42,86,280))
DT2 <- data.table(Mark=c("Mary","Abner","Bonnie","Trista","Norman"),
numA=c(48,20,88,237,10),
numB=c(60,326,54,268,89),
valueA=c(78,34,78,270,60),
valueB=c(92,190,90,385,75))
我的目标:
我想在DT1中找到num和value,并且在DT2中有numA和numB范围。
例如:
对于F行DT1中的num = 59和value = 90的行,还必须匹配:
num(59)> DT2 $ numA(48)和num(59) 匹配!因此添加新的列名结果,其值为dt2标记 如果没有匹配项,请将其设置为“未定义” 所需结果: 如何确保每一行都有比较并添加新列?DT3 <- data.table(ID=c("F","A","E","B","C","D","C"),
num=c(59,3,108,11,22,54,241),
value=c(90,47,189,38,42,86,280),
result=c("Mary","Undefined","Abner","Norman",
"Abner","Abner","Trista"))
答案 0 :(得分:7)
data.table选项:
DT1[DT2, on=.(num > numA, num < numB, value > valueA, value < valueB), Mark := i.Mark]
DT1
ID num value Mark
1: F 59 90 Abner
2: A 3 47 <NA>
3: E 108 189 Abner
4: B 11 72 Norman
5: C 22 42 Abner
6: D 54 86 Abner
7: C 241 280 Trista
答案 1 :(得分:2)
我确信可以使用data.table
中的join操作之一来更有效地解决此问题,但是,这是一个使用mapply
的基本R选项
DT1$result <- mapply(function(x, y) {
inds <- x > DT2$numA & x < DT2$numB & y > DT2$valueA & x < DT2$valueB
if(any(inds))
DT2$Mark[which.max(inds)]
else "Undefined"
}, DT1$num, DT1$value)
DT1
# ID num value result
#1: F 59 90 Mary
#2: A 3 47 Undefined
#3: E 108 189 Abner
#4: B 11 72 Norman
#5: C 22 42 Abner
#6: D 54 86 Mary
#7: C 241 280 Trista