使用dplyr查找数据帧之间的重叠行?

时间:2018-03-22 19:04:43

标签: r dplyr

df1 <- data_frame(time1 = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9),
              time2 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
              id = c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j"))
df2 <- data_frame(time = sort(runif(100, 0, 10)),
              C = rbinom(100, 1, 0.5))

对于df1中的每一行,我想找到df2中时间重叠的行,然后将这组df2行的中值C值分配给df1中的新列。我确定有一些简单的方法可以在功能之间使用dplyr来实现这一点,但我是R的新手,并且我们无法弄明白。谢谢!

2 个答案:

答案 0 :(得分:0)

这是一种方法,使用merge函数基本上执行SQL style cross join,然后使用between函数:

library(tidyverse)
merge(df1, df2, all = TRUE)  %>%
    rowwise() %>%
    mutate(time_between = between(time, time1, time2)) %>%
    filter(time_between) %>%
    group_by(time1, time2, id) %>%
    summarise(med_C = median(C))

使用filter函数可能会导致df1中的某些行丢失,因此另一种方法是:

merge(df1, df2, all = TRUE)  %>%
    rowwise() %>%
    mutate(time_between = between(time, time1, time2)) %>%
    group_by(time1, time2, id) %>%
    summarise(med_C = median(ifelse(time_between, C, NA), na.rm = TRUE))

答案 1 :(得分:0)

您可以在基座R中使用sapply执行此操作:

df1$median_c <- sapply(seq_along(df1$id), function(i) {

    median(df2$C[df2$time > df1$time1[i] & df2$time < df1$time2[i]])

})