Question

我有一个包含开始时间和长度（以秒为单位）的数据框：

dates<-data.frame(start=as.POSIXct(c("2010-04-03 03:02:38 UTC","2010-04-03 06:03:14 UTC","2010-04-20 03:05:52 UTC","2010-04-20 03:17:42 UTC","2010-04-21 03:09:38 UTC","2010-04-21 07:10:14 UTC","2010-04-21 08:12:52 UTC","2010-04-23 03:13:42 UTC","2010-04-23 03:25:42 UTC","2010-04-23 03:36:38 UTC","2010-04-23 08:58:14 UTC","2010-04-24 03:21:52 UTC","2010-04-24 03:22:42 UTC","2010-04-24 07:24:19 UTC","2010-04-24 07:55:19 UTC")),length=c(3600,300,900,3600,300,900,3600,300,900,3600,300,900,3600,300,900))

> dates
                 start length
1  2010-04-03 03:02:38   3600
2  2010-04-03 06:03:14    300
3  2010-04-20 03:05:52    900
4  2010-04-20 03:17:42   3600
5  2010-04-21 03:09:38    300
6  2010-04-21 07:10:14    900
7  2010-04-21 08:12:52   3600
8  2010-04-23 03:13:42    300
9  2010-04-23 03:25:42    900
10 2010-04-23 03:36:38   3600
11 2010-04-23 08:58:14    300
12 2010-04-24 03:21:52    900
13 2010-04-24 03:22:42   3600
14 2010-04-24 07:24:19    300
15 2010-04-24 07:55:19    900

我需要查找2010-04-02 00:00:00到2010-04-21 09:00:00期间的总持续时间（长度），以及2010-04-23 03期间的总持续时间（长度）：15：00至2010-04-24 08:00:00。

棘手的部分是给定的长度可以超过指定时期的结束而且我不想计算这段额外的持续时间。

我希望得到：

2010-04-02 00:00:00至2010-04-21 09:00:00
2010-04-23 03:15:00至2010-04-24 08:00:00

我正在考虑使用lubridate并为每行定义一个间隔，然后对持续时间求和，但我无法弄明白。

Answer 1

不确定究竟是什么问题。另一个答案只是在指定的时间间隔内将WildcardQuery与开始时间相加。但是，我已经解释了这个问题，希望处理长度可能超过指定时间段结束的事件，而不是计算超过指定时间段结束时间的时间（反之亦然，以确定时间段开始之前的开始时间））。例如，第7行在2010-04-21 09:00:00之后运行良好。这就是提供预期输出有用的原因！

无论如何，这是一种方法来做我认为你意味着包含在函数中的东西。方法基本上是创建一个新的开始和结束，如果事件将结束，它是指定间隔的边缘。我可能错过了一些边缘案例，欢迎改进！

{Lucene.Net.Search.MultiTermQuery.AnonymousClassConstantScoreAutoRewrite}

Answer 2

使用first中的last和dplyr函数可以实现另一种可能的解决方案。 first和last函数将允许我们仅针对第一行和最后一行调整sum长度。

library(dplyr)
calculate_duration <- function(df, start_time, end_time){
  start_time <- as.POSIXct(start_time)
  end_time <- as.POSIXct(end_time)

  df %>% filter((start+length) >= start_time & start < end_time) %>%
    arrange(start) %>% 
    summarise(last_time = last(start) + last(length),
       first_time = first(start) + first(length),
       sum = sum(length) - 
       ifelse(last_time > end_time, 
             difftime(last_time, end_time, units = 'secs'), 0L) -
       ifelse(first(start) <  start_time, 
             difftime(start_time, first(start), units = 'secs'), 0L) ) %>%
    select(sum)

}

calculate_duration(dates,"2010-04-02 00:00:00", "2010-04-21 09:00:00")
#    sum
#1 12428

calculate_duration(dates,"2010-04-23 03:15:00", "2010-04-24 08:00:00")
#    sum
#1 10103


# Data

dates<-data.frame(start=as.POSIXct(c("2010-04-03 03:02:38 UTC","2010-04-03 06:03:14 UTC",
"2010-04-20 03:05:52 UTC","2010-04-20 03:17:42 UTC","2010-04-21 03:09:38 UTC",
"2010-04-21 07:10:14 UTC","2010-04-21 08:12:52 UTC","2010-04-23 03:13:42 UTC",
"2010-04-23 03:25:42 UTC","2010-04-23 03:36:38 UTC","2010-04-23 08:58:14 UTC",
"2010-04-24 03:21:52 UTC","2010-04-24 03:22:42 UTC","2010-04-24 07:24:19 UTC",
"2010-04-24 07:55:19 UTC")),
length=c(3600,300,900,3600,300,900,3600,300,900,3600,300,900,3600,300,900))

Answer 3

以下是一个例子：

library(lubridate)

t0 <- as.POSIXct('2010-04-02 00:00:00')
t1 <- as.POSIXct('2010-04-21 09:00:00')

sum(dates$length[dates$start %within% interval(t0,t1)])
# [1] 13200

计算给定时间段内的时间间隔的持续时间

3 个答案: