我有两个类似于以下的data.frames:
#Dt1
Id Date Weight
1 2 10
2 3 20
3 4 30
4 4 30
5 6 40
和
#DT2
Id Date Weight late
1 2 10 3
2 3 20 4
3 4 30 5
8 5 10 6
我希望仅考虑它们之间的不同ID来合并这些文件,如下所示:
#Dt.final
Id Date Weight late
4 4 30 NA
5 6 40 NA
8 5 10 6
谢谢,我的原始文件比这些大。
答案 0 :(得分:4)
除了@yarnabrina答案,anti_join
中的dplyr
也是您所需要的,但是我们必须申请两次。 anti_join(x, y)
丢弃x
中所有与y
中具有匹配项的obs:
> full_join(anti_join(df1, df2, by = 'Id'), anti_join(df2, df1, by = 'Id'))
Joining, by = c("Id", "Date", "Weight")
Id Date Weight late
1 4 4 30 NA
2 5 6 40 NA
3 8 5 10 6
答案 1 :(得分:3)
您是否正在寻找类似的东西?
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
df1 <- data.frame(Id = c(1, 2, 3, 4, 5),
Date = c(2, 3, 4, 4, 6),
Weight = c(10, 20, 30, 30, 40))
df2 <- data.frame(Id = c(1, 2, 3, 8),
Date = c(2, 3, 4, 5),
Weight = c(10, 20, 30, 10),
late = c(3, 4, 5, 6))
full_join(x = filter(.data = df1, Id %in% setdiff(x = df1$Id, y = df2$Id)),
y = filter(.data = df2, Id %in% setdiff(x = df2$Id, y = df1$Id)))
#> Joining, by = c("Id", "Date", "Weight")
#> Id Date Weight late
#> 1 4 4 30 NA
#> 2 5 6 40 NA
#> 3 8 5 10 6
由reprex package(v0.2.1)于2019-05-03创建
答案 2 :(得分:2)
也许这可以解决您的问题,这是一个手工答案,但我希望不要太糟:
df_1 <- data.frame(ID = factor(1:5, levels=1:8),
Date = c(2, 3, 4, 4, 6),
Weight = c(10, 20, 20, 30, 40))
df_2 <- data.frame(ID = factor(4:8, levels=1:8),
Date = c(2, 3, 4, 4, 6),
Weight = c(10, 20, 20, 30, 40),
late = c(1, 2, 3, 4, 5))
# Temporary dataframe
df_temp <- data.frame(
df_1[!df_1$ID %in% df_2$ID, ],
late = NA)
df.final <- rbind(
df_temp,
df_2[!df_2$ID %in% df_1$ID, ])