我有一组带有ID和日期的数据。
对于每个重叠日期,我想用下一行替换重叠(即组合重叠日期)。
注意:有ID只有一行,因此不需要更改。有些没有重叠并且需要保持原样(即有两行)。
示例数据:
ID Start End
1 2007-02-01 2007-03-03
1 2007-03-01 2007-03-31
1 2007-09-01 2008-07-31
6 2011-02-05 2011-03-12
5 2012-11-16 2012-12-26
4 2015-01-03 2015-02-14
3 2008-08-02 2008-09-11
7 2010-09-22 2010-10-22
7 2010-09-24 2010-10-24
7 2010-09-26 2010-10-26
7 2010-09-28 2010-10-28
ID Start End
1 2007-02-01 2007-03-31
1 2007-09-01 2008-07-31
6 2011-02-05 2011-03-12
5 2012-11-16 2012-12-26
4 2015-01-03 2015-02-14
3 2008-08-02 2008-09-11
7 2010-09-22 2010-10-28
答案 0 :(得分:2)
根据示例,在按“ID”分组后,我们采用“开始”的first
和“结束”的last
library(dplyr)
df1 %>%
group_by(ID) %>%
summarise(start = first(start), end = last(end))
基于OP帖子中的更新示例
library(data.table)
df1 %>%
mutate_at(2:3, as.Date, format = "%d/%m/%y") %>%
group_by(ID) %>%
group_by(grp = rleid(lead(start, default = last(start)) < end), add = TRUE) %>%
summarise(start = first(start), end = last(end)) %>%
ungroup %>%
select(-grp) %>%
mutate_at(2:3, format, format = "%d/%m/%y")
# A tibble: 7 x 3
# ID start end
# <int> <chr> <chr>
#1 84 27/03/09 21/07/17
#2 92 20/04/12 25/01/17
#3 108 12/12/14 25/08/17
#4 111 31/01/14 18/11/16
#5 114 10/04/13 15/07/13
#6 130 05/01/11 04/03/12
#7 130 15/05/12 27/09/13
df1 <- structure(list(ID = c(84L, 84L, 92L, 92L, 92L, 108L, 111L, 114L,
130L, 130L), start = c("27/03/09", "23/02/13", "20/04/12", "18/07/14",
"5/12/15", "12/12/14", "31/01/14", "10/04/13", "5/01/11", "15/05/12"
), end = c("24/03/13", "21/07/17", "17/08/14", "4/01/16", "25/01/17",
"25/08/17", "18/11/16", "15/07/13", "4/03/12", "27/09/13")), .Names = c("ID",
"start", "end"), class = "data.frame", row.names = c(NA, -10L
))