我有一张ID为ID的表格,停靠和开始日期。我需要更改一个名为“ running_time_in_days”的新列,以查找每个ID从一年开始日期到下一年停止状态的差异(该过程持续1月1日,因此持续时间超过了新年)。
library(readr)
library(dplyr)
library(lubridate)
# read in some test data
test <- read_csv("id, measure, measure_date
1, start, 1998-10-3
1, stop, 1999-5-1
1, stop, 2000-4-15
1, start, 1999-11-9
2, stop, 2000-6-1
2, stop, 1999-5-25
2, start, 1999-12-1
2, start, 1998-11-12")
# format "measure_date" as a date
test$measure_date <- test$measure_date %>% as_date('%Y-%m-%d')
输出看起来可能像这样:
id start stop duration
1 1998-10-03 1999-05-01 stop - start (days)
1 1999-05-01 2000-04-15 stop - start (days)
2 1998-11-12 1999-05-25 stop - start (days)
2 1999-12-01 2000-06-01 stop - start (days)
答案 0 :(得分:0)
看看这是否对您有用:
test %>%
# create unique row identifier for wide form, since there are
# multiple start / stop dates associated with each id
group_by(id) %>%
arrange(measure_date) %>%
mutate(id2 = rep(seq(1, n()/2), each = 2)) %>%
ungroup() %>%
# convert to wide form & calculate difference between dates
tidyr::spread(measure, measure_date) %>%
mutate(duration = difftime(stop, start, "days")) %>%
select(-id2)
# A tibble: 4 x 4
id start stop duration
<dbl> <date> <date> <drtn>
1 1 1998-10-03 1999-05-01 210 days
2 1 1999-11-09 2000-04-15 158 days
3 2 1998-11-12 1999-05-25 194 days
4 2 1999-12-01 2000-06-01 183 days