按小时查找持续时间

时间:2018-02-20 20:43:44

标签: r time datatable duration lubridate

我有一个以下数据帧(长度以秒为单位):

dates<-data.frame(start=as.POSIXct(c("2010-04-03 03:02:38 UTC","2010-04-03 06:03:14 UTC","2010-04-20 03:05:52 UTC","2010-04-20 03:17:42 UTC","2010-04-21 03:09:38 UTC","2010-04-21 07:10:14 UTC","2010-04-21 08:12:52 UTC","2010-04-23 03:13:42 UTC","2010-04-23 03:25:42 UTC","2010-04-23 03:36:38 UTC","2010-04-23 08:58:14 UTC","2010-04-24 03:21:52 UTC","2010-04-24 03:22:42 UTC","2010-04-24 07:24:19 UTC","2010-04-24 07:55:19 UTC")),length=c(3600,300,900,3600,300,900,3600,300,900,3600,300,900,3600,300,3600))

> dates
                 start length
1  2010-04-03 03:02:38   3600
2  2010-04-03 06:03:14    300
3  2010-04-20 03:05:52    900
4  2010-04-20 03:17:42   3600
5  2010-04-21 03:09:38    300
6  2010-04-21 07:10:14    900
7  2010-04-21 08:12:52   3600
8  2010-04-23 03:13:42    300
9  2010-04-23 03:25:42    900
10 2010-04-23 03:36:38   3600
11 2010-04-23 08:58:14    300
12 2010-04-24 03:21:52    900
13 2010-04-24 03:22:42   3600
14 2010-04-24 07:24:19    300
15 2010-04-24 07:55:19   3600

我想按小时计算总持续时间,例如从00:00:00到01:00:00,从01:00:00到02:00:00,依此类推。但有时开始时间是07:55:19,持续时间是3600(就像在最后一排)我需要将它分成2并计算281秒,时间为07:00:00到08:00:00, 08:00:00至09:00:00期间3319秒。

我会在03:00:00-04:00:00期间找到总持续时间,如:

library(lubridate)

dates$endTime<-dates$start+dates$length
dates$newTime<-format(dates$start, format="%H:%M:%S")
dates$endTime<-format(dates$endTime, format="%H:%M:%S")
dates$dur3<-ifelse(hms(dates$endTime)<hms("04:00:00"), seconds(hms(dates$endTime)-hms(dates$newTime)), seconds(hms("04:00:00")-hms(dates$newTime)))

sum(dates[dates$dur3>0,"dur3"])
12920

我正在考虑计算每一行24个时段中每个时段的持续时间,然后将它们相加,但这样做的效率更高?

2 个答案:

答案 0 :(得分:1)

这是我对这个问题的看法,即使我不完全确定这个任务:首先,我计算下一个小时的重叠

dates$rest <- 3600 - as.numeric(format(dates$start, "%M"))*60 - as.numeric(format(dates$start, "%S"))
dates$excess <- dates$length - dates$rest

接下来,我们循环延伸到下一个小时的那些长度,记住这只有在长度受到3600的限制时才有效。如示例所示。如果没有,循环需要延长一点。

for(row in which(dates$excess > 0)){
  row_to_copy <- dates[row, ]
  dates[row, "length"] <- dates[row, "length"] - row_to_copy$excess
  row_to_copy$start <- row_to_copy$start + 3600
  row_to_copy$length <- row_to_copy$excess
  dates <-rbind(dates, row_to_copy)
}

使用完成的数据集,我们现在定义用于对小时进行分组的列。请注意,如果我们愿意,我们也可以按“日期 - 小时”进行分组。

dates$hours <- format(dates$start, "%H")
res_df <-
  dates %>% 
  group_by(hours) %>%
  summarize(length_total = sum(length))

结果

> res_df
# A tibble: 6 x 2
  hours length_total
  <chr>        <dbl>
1 03           13240
2 04            4460
3 06             300
4 07            1519
5 08            6347
6 09             834

答案 1 :(得分:1)

> a=dates$start
> b=difftime(a+hours(1)-second(a)-minutes(minute(a)),a,units="secs")
> d=c(pmin(b,dates$length),replace(e<-dates$length-b,e<0,0))
> tapply(d,c(hour(a),hour(a)+1),sum)
    3     4     6     7     8     9 
12920  4780   300  1481  6253   966