仅合并r中的不同行

时间:2019-05-03 15:20:28

标签: r dataframe merge

我有两个类似于以下的data.frames:

#Dt1
Id  Date Weight  
1    2     10
2    3     20
3    4     30
4    4     30
5    6     40

#DT2
Id  Date Weight late 
1    2     10     3
2    3     20     4
3    4     30     5
8    5     10     6

我希望仅考虑它们之间的不同ID来合并这些文件,如下所示:

#Dt.final
Id  Date Weight late
4    4     30    NA
5    6     40    NA
8    5     10     6

谢谢,我的原始文件比这些大。

3 个答案:

答案 0 :(得分:4)

除了@yarnabrina答案,anti_join中的dplyr也是您所需要的,但是我们必须申请两次。 anti_join(x, y)丢弃x中所有与y中具有匹配项的obs:

> full_join(anti_join(df1, df2, by = 'Id'), anti_join(df2, df1, by = 'Id'))
Joining, by = c("Id", "Date", "Weight")
  Id Date Weight late
1  4    4     30   NA
2  5    6     40   NA
3  8    5     10    6

答案 1 :(得分:3)

您是否正在寻找类似的东西?

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

df1 <- data.frame(Id = c(1, 2, 3, 4, 5),
                  Date = c(2, 3, 4, 4, 6),
                  Weight = c(10, 20, 30, 30, 40))

df2 <- data.frame(Id = c(1, 2, 3, 8),
                  Date = c(2, 3, 4, 5),
                  Weight = c(10, 20, 30, 10),
                  late = c(3, 4, 5, 6))

full_join(x = filter(.data = df1, Id %in% setdiff(x = df1$Id, y = df2$Id)),
          y = filter(.data = df2, Id %in% setdiff(x = df2$Id, y = df1$Id)))
#> Joining, by = c("Id", "Date", "Weight")
#>   Id Date Weight late
#> 1  4    4     30   NA
#> 2  5    6     40   NA
#> 3  8    5     10    6

reprex package(v0.2.1)于2019-05-03创建

答案 2 :(得分:2)

也许这可以解决您的问题,这是一个手工答案,但我希望不要太糟:

df_1 <- data.frame(ID = factor(1:5, levels=1:8),
                   Date = c(2, 3, 4, 4, 6),
                   Weight = c(10, 20, 20, 30, 40))

df_2 <- data.frame(ID = factor(4:8, levels=1:8),
                   Date = c(2, 3, 4, 4, 6),
                   Weight = c(10, 20, 20, 30, 40),
                   late = c(1, 2, 3, 4, 5))

# Temporary dataframe
df_temp <- data.frame(
  df_1[!df_1$ID %in% df_2$ID, ],
  late = NA)

df.final <- rbind(
  df_temp,
  df_2[!df_2$ID %in% df_1$ID, ])