我有一个人们预约出勤的数据集。当他们错过约会我想计算他们参加的天数,或者如果他们从未做过,则返回NA。
在提出这个问题的过程中,我提出了一个解决方案,计算事件之间的天数,然后计算这些事件的反向累积总和(参见here),按患者分组和改变出勤状态(见here)。我发布这个以防万一它可以帮助别人,或者有人发现错误或者可以提出更好的方法。
library(dplyr)
df <- data.frame(
id = rep(c("A","B"), each = 5),
event = c(FALSE, FALSE, TRUE, TRUE, FALSE,
FALSE, TRUE, FALSE, TRUE, TRUE),
date = as.Date(c("2016-01-02","2016-02-10","2016-02-12","2016-07-05","2016-12-28",
"2016-01-16","2016-02-11","2016-02-15","2016-04-20","2016-10-23")))
df %>%
# Sort data (if not already)
arrange(id, date) %>%
group_by(id) %>%
mutate(
# Calculate days before next appointment
days_next_event = lead(date) - date,
# Identify change in attend status
event_chng_n = cumsum(event != lag(event, default = 1))) %>%
group_by(id, event_chng_n) %>%
mutate(
# Calculate days before next change in event ('cumsum' not defined for "difftime" objects)
days_next_chng = rev(cumsum(rev(as.numeric(
ifelse(is.na(days_next_event), 0, days_next_event)
)))),
# Calculate days before next success
days_next_success = ifelse(event, 0, rev(cumsum(rev(
as.numeric(days_next_event)
)))))
Source: local data frame [10 x 7]
Groups: id, event_chng_n [7]
id event date days_next_event event_chng_n days_next_chng days_next_success
(fctr) (lgl) (date) (dfft) (int) (dbl) (dbl)
1 A FALSE 2016-01-02 39 days 1 41 41
2 A FALSE 2016-02-10 2 days 1 2 2
3 A TRUE 2016-02-12 144 days 2 320 0
4 A TRUE 2016-07-05 176 days 2 176 0
5 A FALSE 2016-12-28 NA days 3 0 NA
6 B FALSE 2016-01-16 26 days 1 26 26
7 B TRUE 2016-02-11 4 days 2 4 0
8 B FALSE 2016-02-15 65 days 3 65 65
9 B TRUE 2016-04-20 186 days 4 186 0
10 B TRUE 2016-10-23 NA days 4 0 0