从R中的2个数据帧中提取Uncommon值

时间:2016-09-04 22:12:05

标签: r dataframe merge

给定两个包含日期的数据框:

d1
#        dates
#    2016-08-01
#    2016-08-02
#    2016-08-03
#    2016-08-04

d2
#        dates
#    2016-08-02
#    2016-08-03
#    2016-08-04
#    2016-08-05
#    2016-08-06

如何创建具有非常用值的第3个数据框?

d3
#        dates
#    2016-08-01
#    2016-08-05
#    2016-08-06

数据:

df1 <- structure(list(dates = structure(c(17014, 17015, 17016, 17017 ), 
class = "Date")), .Names = "dates", row.names = c(NA, -4L), class =  
"data.frame")

df2 <- structure(list(dates = structure(c(17015, 17016, 17017, 17018, 
17019), class = "Date")), .Names = "dates", row.names = c(NA, -5L), class 
= "data.frame")

2 个答案:

答案 0 :(得分:2)

假设您有两个向量xy,则不共享的元素是

c(x[!(x %in% y)], y[!(y %in% x)])

如果您使用数据框,只要您的dates列是&#34;字符&#34;或&#34;日期&#34;而不是&#34;因素&#34;,你可以做

rbind(subset(df1, !(df1$dates %in% df2$dates)),
      subset(df2, !(df2$dates %in% df1$dates)))

简单的矢量示例

x <- 1:5
y <- 3:8
c(x[!(x %in% y)], y[!(y %in% x)])
# [1] 1 2 6 7 8

&#34;日期&#34;

的矢量
x <- seq(from = as.Date("2016-01-01"), length = 5, by = 1)
y <- seq(from = as.Date("2016-01-03"), length = 5, by = 1)
c(x[!(x %in% y)], y[!(y %in% x)])
# [1] "2016-01-01" "2016-01-02" "2016-01-06" "2016-01-07"

问题中的示例数据框

rbind(subset(df1, !(df1$dates %in% df2$dates)),
      subset(df2, !(df2$dates %in% df1$dates)))

#       dates
#1 2016-08-01
#4 2016-08-05
#5 2016-08-06

答案 1 :(得分:1)

您可能只是使用其他人已经显示的联接。我个人喜欢在基地R中使用?setops。像这样:

# if they are just character/factor variables
setdiff(d1$dates, d2$dates)
# if they are date variables
setdiff(as.character(d1$dates), as.character(d2$dates)) 
# then convert back to as.Date(setdiff(...))

应用此功能,您可以根据结果过滤data.frame,或者像@ZheyuanLi已间接识别,使用匹配排除:

# If they are date variables
d2[!as.character(d2$dates) %in% as.character(d1$dates),]
# If they are character/factor variables
d2[!d2$dates %in% d1$dates,]