如何基于另一个数据帧中的时间间隔在一个数据帧中添加新列

时间:2019-03-26 23:16:28

标签: r

我有两个数据帧。其中之一(df1)总结了在特定区域中检测到鱼时的情况。另一个总结了在这个特定区域中有潜水者的时期。例如:

datetime<- c("2016-08-01 06:00:02","2016-08-01 09:31:27","2016-08-01 13:34:02","2016-08-01 16:45:15","2016-08-02 09:07:12","2016-08-02 11:25:02","2016-08-02 17:25:02","2016-08-02 21:50:00")
df1<-data.frame(datetime)
df1$datetime<- as.POSIXct(df1$datetime, format = "%Y-%m-%d %H:%M:%S")
start<- c("2016-08-01 07:00:00","2016-08-01 08:30:00","2016-08-01 10:30:00","2016-08-01 16:00:00","2016-08-02 10:00:00","2016-08-02 16:00:00")
end<- c("2016-08-01 08:30:00","2016-08-01 10:00:00","2016-08-01 12:00:00","2016-08-01 17:30:00","2016-08-02 11:30:00","2016-08-02 17:30:00")
divers<-c(6,2,8,12,8,7)
df2<-data.frame(start,end,divers)
df2$start<- as.POSIXct(df2$start, format = "%Y-%m-%d %H:%M:%S")
df2$end<- as.POSIXct(df2$end, format = "%Y-%m-%d %H:%M:%S")

df1
        datetime
1 2016-08-01 06:00:02
2 2016-08-01 09:31:27
3 2016-08-01 13:34:02
4 2016-08-01 16:45:15
5 2016-08-02 09:07:12
6 2016-08-02 11:25:02
7 2016-08-02 17:25:02
8 2016-08-02 21:50:00

df2 # Notice there are four periods with divers on 2016-08-01 and only two on 2016-08-02.

            start               end         divers
1 2016-08-01 07:00:00 2016-08-01 08:30:00      6
2 2016-08-01 08:30:00 2016-08-01 10:00:00      2
3 2016-08-01 10:30:00 2016-08-01 12:00:00      8
4 2016-08-01 16:00:00 2016-08-01 17:30:00     12
5 2016-08-02 10:00:00 2016-08-02 11:30:00      8
6 2016-08-02 16:00:00 2016-08-02 17:30:00      7

我想在数据框df1的新列中添加有关潜水员存在的信息。在df1的这个新列中,我们将其称为“潜水员”,我想显示在检测到鱼时存在的潜水员的数量。如果根据df1提示存在鱼,则根据df2提示该区域中没有潜水员,则将“ df1 $ divers”加0。如果根据df1出现鱼,则有5个潜水员,则将5个添加到“ df1 $ divers”。作为我期望的示例:

datetime<- c("2016-08-01 06:00:02","2016-08-01 09:31:27","2016-08-01 13:34:02","2016-08-01 16:45:15","2016-08-02 09:07:12","2016-08-02 11:25:02","2016-08-02 17:25:02","2016-08-02 21:50:00")
divers<- c(0,2,0,12,0,8,7,0)
result<-data.frame(datetime,divers)
result$datetime<- as.POSIXct(result$datetime, format = "%Y-%m-%d %H:%M:%S")

result
             datetime divers
1 2016-08-01 06:00:02      0
2 2016-08-01 09:31:27      2
3 2016-08-01 13:34:02      0
4 2016-08-01 16:45:15     12
5 2016-08-02 09:07:12      0
6 2016-08-02 11:25:02      8
7 2016-08-02 17:25:02      7
8 2016-08-02 21:50:00      0

1 个答案:

答案 0 :(得分:0)

使用基数R,我们可以在sapply的{​​{1}}列上使用datetime,以找出介于df1start时间之间的时间。 end,获得相应的df2divers

sum

我们可以使用df1$divers <- sapply(df1$datetime, function(x) sum(with(df2, divers[x >= start & x <= end]))) df1 # datetime divers #1 2016-08-01 06:00:02 0 #2 2016-08-01 09:31:27 2 #3 2016-08-01 13:34:02 0 #4 2016-08-01 16:45:15 12 #5 2016-08-02 09:07:12 0 #6 2016-08-02 11:25:02 8 #7 2016-08-02 17:25:02 7 #8 2016-08-02 21:50:00 0 / dplyr替代品和purrr

map_dbl

在OP的示例中,没有必要进行library(dplyr) library(purrr) df1 %>% mutate(divers = map_dbl(datetime, ~ sum(with(df2, divers[. >= start & . <= end])))) 的{​​{1}},因为sumdivers时间没有重叠,但是如果有重叠最好有start来添加该时间段内的所有潜水员。