计算最大日期间隔-R

时间:2018-10-02 15:52:59

标签: r time-series intervals lubridate

面临的挑战是具有一个组变量(id)和两个日期变量(startstop)的data.frame。日期间隔是不规则的,我正尝试计算从每个组的第一个start日期开始的天数。

示例数据:

data <- data.frame(
  id = c(1, 2, 2, 3, 3, 3, 3, 3, 4, 5),
  start = as.Date(c("2016-02-18", "2016-12-07", "2016-12-12", "2015-04-10", 
                    "2015-04-12", "2015-04-14", "2015-05-15", "2015-07-14", 
                    "2010-12-08", "2011-03-09")),
  stop = as.Date(c("2016-02-19", "2016-12-12", "2016-12-13", "2015-04-13", 
                   "2015-04-22", "2015-05-13", "2015-07-13", "2015-07-15", 
                   "2010-12-10", "2011-03-11"))
)

> data
   id      start       stop
1   1 2016-02-18 2016-02-19
2   2 2016-12-07 2016-12-12
3   2 2016-12-12 2016-12-13
4   3 2015-04-10 2015-04-13
5   3 2015-04-12 2015-04-22
6   3 2015-04-14 2015-05-13
7   3 2015-05-15 2015-07-13
8   3 2015-07-14 2015-07-15
9   4 2010-12-08 2010-12-10
10  5 2011-03-09 2011-03-11

目标是这样的data.frame:

   id      start       stop duration_from_start
1   1 2016-02-18 2016-02-19                   2
2   2 2016-12-07 2016-12-12                   7
3   2 2016-12-12 2016-12-13                   7
4   3 2015-04-10 2015-04-13                  34
5   3 2015-04-12 2015-04-22                  34
6   3 2015-04-14 2015-05-13                  34
7   3 2015-05-15 2015-07-13                  34
8   3 2015-07-14 2015-07-15                  34
9   4 2010-12-08 2010-12-10                   3
10  5 2011-03-09 2011-03-11                   3

或者这个:

  id      start       stop duration_from_start
1  1 2016-02-18 2016-02-19                   2
2  2 2016-12-07 2016-12-13                   7
3  3 2015-04-10 2015-05-13                  34
4  4 2010-12-08 2010-12-10                   3
5  5 2011-03-09 2011-03-11                   3

确定从行67的间隔并将此点作为最大间隔(34天)很重要。从2018-10-012018-10-01的间隔将计为1

我通常的lubridate方法不适用于本示例(interval %within lag(interval))。

有什么主意吗?

1 个答案:

答案 0 :(得分:2)

library(magrittr)
library(data.table)
setDT(data)

first_int <- function(start, stop){
  ind <- rleid((start - shift(stop, fill = Inf)) > 0) == 1
  list(start = min(start[ind]),
       stop  = max(stop[ind]))
}

newdata <- 
  data[, first_int(start, stop), by = id] %>% 
     .[, duration := stop - start + 1]


#    id      start       stop duration
# 1:  1 2016-02-18 2016-02-19   2 days
# 2:  2 2016-12-07 2016-12-13   7 days
# 3:  3 2015-04-10 2015-05-13  34 days
# 4:  4 2010-12-08 2010-12-10   3 days
# 5:  5 2011-03-09 2011-03-11   3 days