Question

我正在寻找（1）名称和（2）以下R（base和data.table preferred）中的（清洁）方法。

输入

<head id="head">
<meta charset="utf-8">
 <title></title>
</head>




<script type="text/javascript">
    var head = document.getElementById('head');
    if (window.screen.width<800) {
      head.innerHTML = "  <link rel='stylesheet' href='../css/gallerys.css?ver=1.2'>";
    }else {
      head.innerHTML="<link rel='stylesheet' href='../css/gallerys.css?ver=1.3'>";
    }
</script>

（请注意，实际的data.frames有数百列）

预期输出：

> d1
  id  x  y
1  1  1 NA
2  2 NA  3
3  3  4 NA
> d2
  id  x  y z
1  4 NA 30 a
2  3 20  2 b
3  2 14 NA c
4  1 15 97 d

数据和当前解决方案：

> d1
  id  x  y z
1  1  1 97 d
2  2 14  3 c
3  3  4  2 b

PS：

以前可能会问这个问题，但我缺乏搜索词汇量。

Answer 1

使用dplyr::left_join：

是可能的

left_join(d1, d2, by = "id") %>%
    mutate(
        x = ifelse(!is.na(x.x), x.x, x.y),
        y = ifelse(!is.na(y.x), y.x, y.y)) %>%
    select(id, x, y, z)
#  id  x  y z
#1  1  1 97 d
#2  2 14  3 c
#3  3  4  2 b

Answer 2

假设您正在使用2个data.frames，这是一个基本解决方案

#expand d1 to have the same columns as d2
d <- merge(d1, d2[, c("id", setdiff(names(d2), names(d1))), drop=FALSE], 
    by="id", all.x=TRUE, all.y=FALSE)

#make sure that d2 also have same number of columns as d1
d2 <- merge(d2, d1[, c("id", setdiff(names(d1), names(d2))), drop=FALSE], 
    by="id", all.x=TRUE, all.y=FALSE)

#align rows and columns to match those in d1
mask <- d2[match(d1$id, d2$id), names(d)]

#replace NAs with those mask
replace(d, is.na(d), mask[is.na(d)])

如果您不介意，我们可以将您的问题重写为一般的矩阵 - 合并问题（即任意数量的矩阵，列，行），这似乎是以前没有被问过的。

编辑：

另一个基础R解决方案是来自How to implement coalesce efficiently in R

的coalesce1a黑客攻击

coalesce.mat <- function(...) {
    ans <- ..1  
    for (elt in list(...)[-1]) {
        rn <- match(ans$id, elt$id)
        ans[is.na(ans)] <- elt[rn, names(ans)][is.na(ans)]
    }
    ans         
}

allcols <- Reduce(union, lapply(list(d1, d2), names))
do.call(coalesce.mat, 
    lapply(list(d1, d2), function(x) {
        x[, setdiff(allcols, names(x))] <- NA
        x 
    }))

编辑：

由Martin Morgan使用How to implement coalesce efficiently in R data.table的{{1}}解决方案。

coalesce1a

Answer 3

我们可以将data.table与coalesce dplyr一起使用。在两个数据集中创建常见（＆＃39; nm1＆＃39;）和差异（＆＃39; nm2＆＃39;）的vector列名称。将第一个数据集转换为＆＃39; data.table＆＃39; （setDT(d1)），加入on＆＃39; id＆＃39;列，分配（:=）第一个和第二个coalesce d列（带前缀i. - 如果有共同列）来更新第一个数据集中的值

library(data.table)
nm1 <- setdiff(intersect(names(d1), names(d2)), 'id')
nm2 <- setdiff(names(d2), names(d1))
setDT(d1)[d2, c(nm1, nm2) := c(Map(dplyr::coalesce, mget(nm1), 
              mget(paste0("i.", nm1))), mget(nm2)), on = .(id)]
d1
#   id  x  y z
#1:  1  1 97 d
#2:  2 14  3 c
#3:  3  4  2 b

如果缺少，则使用另一个值更新现有data.frame

3 个答案: