df1 <- data.frame(id=c(1,2,3,4,5,8), var=c("a","b","c","d","e","t"), stringsAsFactors = F)
df2 <- data.frame(id=c(1,2,3,4,5,6,7), var=c("e","f","c","d","e","g","h"), stringsAsFactors = F)
df <- data.frame(id=c(1,2,3,4,5,6,7,8))
我需要加入以获得df的var值,但我想要df2而不是df1的var值,并且如果df2中没有等效值,那么我想从df1中获取它。我有这个,但是有更简单的方法吗?以及如何添加列以查看var的来源?
df %>% left_join(df1, by="id") %>% left_join(df2, by="id") %>%
dplyr::mutate(var=ifelse(!is.na(var.x), var.x, var.y))
答案 0 :(得分:0)
我们可以这样使用SQL三重联接:
library(sqldf)
sqldf("select a.*, coalesce(b.var, c.var) as var
from df a
left join df1 b using(id)
left join df2 c using(id)")
给予:
id var
1 1 a
2 2 b
3 3 c
4 4 d
5 5 e
6 6 g
7 7 h
8 8 t
如果需要将其放入管道中:
df %>%
{ sqldf("select a.*, coalesce(b.var, c.var) as var
from [.] a
left join df1 b using(id)
left join df2 c using(id)") }
答案 1 :(得分:0)
首先在bind_rows
和df1
上使用df2
,如果设置了参数var
,则可以看到.id
的来源。
library(dplyr)
bind_rows(df1 = df1, df2 = df2, .id = "from") %>%
distinct(id, .keep_all = T) %>%
right_join(df)
# from id var
# 1 df1 1 a
# 2 df1 2 b
# 3 df1 3 c
# 4 df1 4 d
# 5 df1 5 e
# 6 df2 6 g
# 7 df2 7 h
# 8 df1 8 t