计算一年后滞后的天数差异

时间:2019-10-23 14:55:52

标签: r tidyr lubridate

我有一张ID为ID的表格,停靠和开始日期。我需要更改一个名为“ running_time_in_days”的新列,以查找每个ID从一年开始日期到下一年停止状态的差异(该过程持续1月1日,因此持续时间超过了新年)。

library(readr)
library(dplyr)
library(lubridate)

# read in some test data
test <- read_csv("id, measure, measure_date
        1, start, 1998-10-3 
        1, stop, 1999-5-1
        1, stop, 2000-4-15
        1, start, 1999-11-9
        2, stop, 2000-6-1
        2, stop, 1999-5-25
        2, start, 1999-12-1 
        2, start, 1998-11-12")
# format "measure_date" as a date
test$measure_date <- test$measure_date %>% as_date('%Y-%m-%d')

输出看起来可能像这样:

    id      start       stop        duration
    1   1998-10-03  1999-05-01  stop - start (days)
    1   1999-05-01  2000-04-15  stop - start (days)
    2   1998-11-12  1999-05-25  stop - start (days)
    2   1999-12-01  2000-06-01  stop - start (days)

1 个答案:

答案 0 :(得分:0)

看看这是否对您有用:

test %>% 
  # create unique row identifier for wide form, since there are
  # multiple start / stop dates associated with each id
  group_by(id) %>% 
  arrange(measure_date) %>% 
  mutate(id2 = rep(seq(1, n()/2), each = 2)) %>% 
  ungroup() %>% 

  # convert to wide form & calculate difference between dates
  tidyr::spread(measure, measure_date) %>% 
  mutate(duration = difftime(stop, start, "days")) %>% 
  select(-id2)

# A tibble: 4 x 4
     id start      stop       duration
  <dbl> <date>     <date>     <drtn>  
1     1 1998-10-03 1999-05-01 210 days
2     1 1999-11-09 2000-04-15 158 days
3     2 1998-11-12 1999-05-25 194 days
4     2 1999-12-01 2000-06-01 183 days