R数据表:更新连接

时间:2017-06-08 10:38:54

标签: r data.table

假设我有两个数据表:

X <- data.table(id = 1:5, L = letters[1:5])

   id L
1:  1 a
2:  2 b
3:  3 c
4:  4 d
5:  5 e

Y <- data.table(id = 3:5, L = c(NA, "g", "h"), N = c(10, NA, 12))

   id  L  N
1:  3 NA 10
2:  4  g NA
3:  5  h 12

是否可以使用数据表内置函数通过X执行Yid的左外连接?如果没有,我想构建一个具有以下预期输出的函数(例如leftOuterJoin):

leftOuterJoin(X, Y, on = "id")

   id  L  N
1:  1  a NA
2:  2  b NA
3:  3 NA 10
4:  4  g NA
5:  5  h 12

我尝试过没有成功:

X[Y, on = "id"]

   id L i.L  N
1:  3 c  NA 10
2:  4 d   g NA
3:  5 e   h 12

我也试过这个,这几乎是我想要的东西:

setkey(X, id)
setkey(Y, id)
merge(X, Y, all.x = TRUE)

   id L.x L.y  N
1:  1   a  NA NA
2:  2   b  NA NA
3:  3   c  NA 10
4:  4   d   g NA
5:  5   e   h 12

2 个答案:

答案 0 :(得分:6)

这是一个更新加入:

library(data.table)
X <- data.table(id = 1:5, L = letters[1:5])
Y <- data.table(id = 3:5, L = c(NA, "g", "h"), N = c(10, NA, 12))
X[Y, on=.(id), c("L", "N"):=.(i.L, i.N)][]
#    id  L  N
# 1:  1  a NA
# 2:  2  b NA
# 3:  3 NA 10
# 4:  4  g NA
# 5:  5  h 12

为您提供所需的结果。
Here我找到了多列的解决方案:

library(data.table)
X <- data.table(id = 1:5, L = letters[1:5])
Y <- data.table(id = 3:5, L = c(NA, "g", "h"), N = c(10, NA, 12))

X[Y, on=.(id), names(Y)[-1]:=mget(paste0("i.", names(Y)[-1]))]

另一种变体:

n <- names(Y)
X[Y, on=.(id), (n):=mget(paste0("i.", n))]

答案 1 :(得分:0)

我可能错过了几件事,如果有更好的解决方案,请纠正我。 我通常喜欢为此类事情编写函数。

这里:目标是使所有可能性都可用。联接和更新联接变量,使用其他变量名...

> update.DT <- function(DATA1, DATA2, join.variable, overwrite.variable, overwrite.with.variable) {
+       
+       DATA1[DATA2, c(overwrite.variable) := mget(p0("i.", overwrite.with.variable)), on = join.variable][]
+       
+     }
> X <- X2 <- X3 <- data.table(id = 1:5, L = letters[1:5], PS = rep(59, 5))
> Y <- data.table(id = 3:5, id2 = 11:13, L = c("z", "g", "h"), PS = rep(61, 3))
> X
   id L PS
1:  1 a 59
2:  2 b 59
3:  3 c 59
4:  4 d 59
5:  5 e 59
> Y
   id id2 L PS
1:  3  11 z 61
2:  4  12 g 61
3:  5  13 h 61
> update.DT(DATA1 = X, DATA2 = Y, join.variable = "id", overwrite.variable = c("L"), overwrite.with.variable = c("L"))
   id L PS
1:  1 a 59
2:  2 b 59
3:  3 z 59
4:  4 g 59
5:  5 h 59
> update.DT(DATA1 = X2, DATA2 = Y, join.variable = "id", overwrite.variable = c("L", "PS"), overwrite.with.variable = c("L", "PS"))
   id L PS
1:  1 a 59
2:  2 b 59
3:  3 z 61
4:  4 g 61
5:  5 h 61
> update.DT(DATA1 = X2, DATA2 = Y, join.variable = "id", overwrite.variable = c("L", "PS", "id"), overwrite.with.variable = c("L", "PS", "id2"))
   id L PS
1:  1 a 59
2:  2 b 59
3: 11 z 61
4: 12 g 61
5: 13 h 61