根据来自不同数据帧的多个条件过滤数据帧

时间:2016-08-29 18:23:10

标签: r dplyr

我想根据不同数据框('data')中的多个值过滤一个数据框('key')。

我的'key'看起来像这样

exhibit.name  <- c("lions", "otters", "penguins")
exhibit.start <- c(as.Date("2016-04-01"), as.Date("2016-05-01"), as.Date("2016-06-01"))
exhibit.end   <- c(as.Date("2016-04-30"), as.Date("2016-05-31"), as.Date("2016-06-30"))
key           <- data_frame(exhibit.name, exhibit.start, exhibit.end)

我的'data'看起来像这样

exhibit.name <- c("lions", "lions", "otters", 
                  "otters", "penguins", "penguins")
exhibit.date <- c(as.Date("2016-04-15"), as.Date("2016-12-15"), as.Date("2016-05-15"),
                  as.Date("2016-02-15"), as.Date("2016-06-15"), as.Date("2016-10-15"))
data         <- data_frame(exhibit.name, exhibit.date)

我需要过滤'data'以返回data$exhibit.name匹配key$exhibit.name且其data$exhibit.date属于相关key$exhibit.startkey$exhibit.end日期的行。结果数据框如下所示:

> valid.exhibits
1|lions   |2016-04-15
2|otters  |2016-05-15
3|penguins|2016-06-15

谢谢!

1 个答案:

答案 0 :(得分:4)

我们可以left_join然后filter

data %>% 
   left_join(., key) %>%
   filter(exhibit.start < exhibit.date, exhibit.end  > exhibit.date)  %>% 
   select(1:2)
#     exhibit.name exhibit.date
#         <chr>       <date>
#1        lions   2016-04-15
#2       otters   2016-05-15
#3     penguins   2016-06-15

我们也可以使用非equi(data.table的开发版本的条件连接),即v1.9.7 +

library(data.table)
setDT(key)
setDT(data)[key, on = .(exhibit.name, exhibit.date > exhibit.start, 
          exhibit.date < exhibit.end), new := 1]
na.omit(data)[, new := NULL][]
#   exhibit.name exhibit.date
#1:        lions   2016-04-15
#2:       otters   2016-05-15
#3:     penguins   2016-06-15
相关问题