R:基于多个条件的两个数据帧的子集

时间:2019-06-06 15:37:55

标签: r conditional-statements statements

我有两个数据帧(df1和df2),我想要一个新的数据帧(df3),其中包含df1的“ date”和“ time_of_day”与df2匹配的所有行。并将不匹配的df1行保存在新数据帧(df4)中。

我尝试使用dplyr过滤器函数,但是好像我没有正确编写它,因为我得到了一个与df1相同长度的新数据框,但它应该只显示基于变量date和time的匹配行一天。

> df1
          date time_of_day     
1  2018-06-03     morning 
2  2018-06-06     afternoon 
4  2018-06-09     morning 
5  2018-06-10     afternoon 

> df2
          date time_of_day     
1  2018-06-03     morning 
2  2018-06-06     morning 
3  2018-06-08     morning 
4  2018-06-09     morning 
5  2018-06-10     afternoon
6  2018-06-11     afternoon

#creating a new data frame
df3 <- filter(df1, date %in% df2$date & time_of_day %in% df2$time_of_day)
#another try 
df3 <- df1[df1$date %in% df2$date & df1$time_of_day %in% df2$time_of_day,]

这就是我想要的:

> df3
          date time_of_day     
1  2018-06-03     morning 
2  2018-06-09     morning 
3  2018-06-10     afternoon 

> df4
          date time_of_day     
1  2018-06-06     afternoon 

2 个答案:

答案 0 :(得分:3)

我们可以使用inner_join

library(dplyr)
df3 <- inner_join(df1, df2)
df3
#       date time_of_day
#1 2018-06-03     morning
#2 2018-06-09     morning
#3 2018-06-10   afternoon

anti_join

df4 <- anti_join(df1, df2)
df4
#       date time_of_day
#1 2018-06-06   afternoon

数据

df1 <- structure(list(date = c("2018-06-03", "2018-06-06", "2018-06-09", 
"2018-06-10"), time_of_day = c("morning", "afternoon", "morning", 
"afternoon")), class = "data.frame", row.names = c("1", "2", 
"4", "5"))

df2 <- structure(list(date = c("2018-06-03", "2018-06-06", "2018-06-08", 
"2018-06-09", "2018-06-10", "2018-06-11"), time_of_day = c("morning", 
"morning", "morning", "morning", "afternoon", "afternoon")),
class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6"))

答案 1 :(得分:2)

更改基本的R代码,您可以这样做(如下)。如果要删除重复的行,可以将它们包装在unique()中。

df1[paste0(df1$date, df1$time_of_day) %in% paste0(df2$date, df2$time_of_day), ]
        date time_of_day
1 2018-06-03     morning
4 2018-06-09     morning
5 2018-06-10   afternoon

df1[!paste0(df1$date, df1$time_of_day) %in% paste0(df2$date, df2$time_of_day), ]
        date time_of_day
2 2018-06-06   afternoon

您之前的尝试无效,因为df1$date %in% df2$date & df1$time_of_day %in% df2$time_of_day的值为TRUE TRUE TRUE TRUE。因此,它保留了所有行。也就是说:df1中的所有日期都在df2中,而df1中的所有时间都在df2中。

编辑:

或者,在dplyr中,您可以使用intersectsetdiff处理数据框并删除重复项:

dplyr::intersect(df1, df2)
        date time_of_day
1 2018-06-03     morning
2 2018-06-09     morning
3 2018-06-10   afternoon

dplyr::setdiff(df1, df2)
        date time_of_day
1 2018-06-06   afternoon