我看到了很多有关如何将连续日期合并为一行的主题,并尝试了其中的少数几个主题(包括this并使用lead
中的dplyr
),但是到目前为止,还没有找不到专门回答我问题的线程。
这是我的数据:
df <- data.frame(
id = c("A", "A", "A", "B", "B", "C", "C", "C"),
start = as.Date(c("2013-05-21", "2014-03-17", "2014-12-12", "2009-03-08",
"2011-07-30", "2008-10-07", "2009-11-21", "2010-12-01")),
end = as.Date(c("2014-03-16", "2014-12-11", NA, "2011-07-14",
NA, "2009-11-20", NA, NA)),
status = c("expired", "expired", "active", "expired",
"active", "expired", "expired", "active")
)
下面是我想要的输出:
id start end status
A 2013-05-21 NA active
B 2009-03-08 2011-07-14 expired
B 2011-07-30 NA active
C 2008-10-07 NA active
所以我想做的事情有三点:
1)如果行是连续的,即结束日期+ 1是下一行的开始日期,我想将它们折叠为一行(如ID A中所示)
2)如果行不是连续的,即结束日期+ 1不是下一行的开始日期,我想将它们分开(如ID B)
3)如果“过期”行没有结束日期,我仍然希望将它们折叠为一行(如ID C)
任何帮助将不胜感激!
答案 0 :(得分:1)
您可以选择类似的东西
library(tidyverse)
df %>%
group_by(id) %>%
mutate(
end = if_else(is.na(end), lead(start), end),
flag = if_else(start <= lag(end) + 1, 0, 1),
flag = if_else(is.na(flag), 0, flag),
group = cumsum(flag),
flag = NULL
) %>%
group_by(id, group) %>%
mutate(
start = first(start),
end = last(end),
status = last(status)
) %>% ungroup() %>%
distinct(id, start, end, status)
输出:
# A tibble: 4 x 4
id start end status
<fct> <date> <date> <fct>
1 A 2013-05-21 NA active
2 B 2009-03-08 2011-07-14 expired
3 B 2011-07-30 NA active
4 C 2008-10-07 NA active