连接data.table时如何查找最接近的最大值

时间:2018-07-19 01:47:31

标签: r join data.table

我有以下2个数据表:

DT1 <- data.table(A = c(100,50,10), B = c("Good","Ok","Bad"))
DT1
     A    B
1: 100 Good
2:  50   Ok
3:  10  Bad

DT2 <- data.table(A = c(99,34,5,"",24,86))
DT2
    A
1: 99
2: 34
3:  5
4:   
5: 24
6: 86   

加入DT1和DT2时我想返回的内容是

DT2
    A       B
1: 99    Good
2: 34    Ok
3:  5    Bad
4:       NA
5: 24    Ok
6: 86    Good

data.table中的“ roll”选项仅用于“最近”匹配,因此在我的情况下不起作用。有什么办法可以对data.table进行这种查找吗?

2 个答案:

答案 0 :(得分:2)

滚动连接如果向后滚动(NOCB =向后携带的下一个礼节)确实对我有用:

library(data.table)
DT1 <- data.table(A = c(100, 50, 10), B = c("Good", "Ok", "Bad"))
DT2 <- data.table(A = c(99, 34, 5, "", 24, 86))

DT2[, A := as.numeric(A)]
DT1[DT2, on = "A", roll = -Inf]
    A    B
1: 99 Good
2: 34   Ok
3:  5  Bad
4: NA <NA>
5: 24   Ok
6: 86 Good

请注意,只有两列A均为数字(或整数)时,此方法才有效。通过使用"",OP将DT2$A变成了字符列。

答案 1 :(得分:1)

这是基本的R方法

df1 <- as.data.frame(DT1)
df2 <- as.data.frame(DT2)

df2$B <- apply(df2, 1, function(x) {
    if(x != "") df1$B[which.min(abs(as.numeric(x) - df1$A))] else NA
})
df2
#    A    B
# 1 99 Good
# 2 34   Ok
# 3  0  Bad
# 4    <NA>
# 5 24  Bad
# 6 86 Good

或者使用data.table s

DT2[, B := apply(DT2, 1, function(x) 
    if(x != "") DT1$B[which.min(abs(as.numeric(x) - DT1$A))] else NA)]
DT2
#    A    B
#1: 99 Good
#2: 34   Ok
#3:  0  Bad
#4:      NA
#5: 24  Bad
#6: 86 Good

我们根据DT1$ADT2$A值之间的最小绝对差进行匹配。


样本数据

DT1 <- data.table(A = c(100,50,0), B = c("Good","Ok","Bad"))
DT2 <- data.table(A = c(99,34,0,"",24,86))