有没有一种方法可以与一个表合并,如果没有与第二个表合并?

时间:2020-03-27 14:12:43

标签: r merge dplyr

df1 <- data.frame(id=c(1,2,3,4,5,8), var=c("a","b","c","d","e","t"), stringsAsFactors = F)
df2 <- data.frame(id=c(1,2,3,4,5,6,7), var=c("e","f","c","d","e","g","h"), stringsAsFactors = F)
df <- data.frame(id=c(1,2,3,4,5,6,7,8))

我需要加入以获得df的var值,但我想要df2而不是df1的var值,并且如果df2中没有等效值,那么我想从df1中获取它。我有这个,但是有更简单的方法吗?以及如何添加列以查看var的来源?

df %>% left_join(df1, by="id") %>% left_join(df2, by="id") %>%
  dplyr::mutate(var=ifelse(!is.na(var.x), var.x, var.y))

2 个答案:

答案 0 :(得分:0)

我们可以这样使用SQL三重联接:

library(sqldf)
sqldf("select a.*, coalesce(b.var, c.var) as var
 from df a
 left join df1 b using(id)
 left join df2 c using(id)")

给予:

  id var
1  1   a
2  2   b
3  3   c
4  4   d
5  5   e
6  6   g
7  7   h
8  8   t

如果需要将其放入管道中:

df %>%
    { sqldf("select a.*, coalesce(b.var, c.var) as var
     from [.] a
     left join df1 b using(id)
     left join df2 c using(id)") }

答案 1 :(得分:0)

首先在bind_rowsdf1上使用df2,如果设置了参数var,则可以看到.id的来源。

library(dplyr)

bind_rows(df1 = df1, df2 = df2, .id = "from") %>% 
  distinct(id, .keep_all = T) %>%
  right_join(df)

#   from id var
# 1  df1  1   a
# 2  df1  2   b
# 3  df1  3   c
# 4  df1  4   d
# 5  df1  5   e
# 6  df2  6   g
# 7  df2  7   h
# 8  df1  8   t