对于每个ID,查找日期是否重叠,然后创建新日期并删除行

时间:2018-03-18 02:19:03

标签: r date dataframe dplyr

我有一组带有ID和日期的数据。

对于每个重叠日期,我想用下一行替换重叠(即组合重叠日期)。

注意:有ID只有一行,因此不需要更改。有些没有重叠并且需要保持原样(即有两行)。

示例数据:

ID Start      End
1  2007-02-01 2007-03-03  
1  2007-03-01 2007-03-31  
1  2007-09-01 2008-07-31  
6  2011-02-05 2011-03-12  
5  2012-11-16 2012-12-26  
4  2015-01-03 2015-02-14  
3  2008-08-02 2008-09-11  
7  2010-09-22 2010-10-22  
7  2010-09-24 2010-10-24  
7  2010-09-26 2010-10-26  
7  2010-09-28 2010-10-28


ID Start      End
1  2007-02-01 2007-03-31  
1  2007-09-01 2008-07-31  
6  2011-02-05 2011-03-12  
5  2012-11-16 2012-12-26  
4  2015-01-03 2015-02-14  
3  2008-08-02 2008-09-11  
7  2010-09-22 2010-10-28

1 个答案:

答案 0 :(得分:2)

根据示例,在按“ID”分组后,我们采用“开始”的first和“结束”的last

library(dplyr)
df1 %>%
   group_by(ID) %>%
   summarise(start = first(start), end = last(end)) 

更新

基于OP帖子中的更新示例

library(data.table)
df1 %>% 
    mutate_at(2:3, as.Date, format = "%d/%m/%y") %>%
    group_by(ID) %>% 
    group_by(grp = rleid(lead(start, default = last(start)) < end), add = TRUE) %>% 
    summarise(start = first(start), end = last(end)) %>%
    ungroup %>% 
    select(-grp)  %>% 
    mutate_at(2:3, format, format = "%d/%m/%y")
# A tibble: 7 x 3
#     ID start    end     
#  <int> <chr>    <chr>   
#1    84 27/03/09 21/07/17
#2    92 20/04/12 25/01/17
#3   108 12/12/14 25/08/17
#4   111 31/01/14 18/11/16
#5   114 10/04/13 15/07/13
#6   130 05/01/11 04/03/12
#7   130 15/05/12 27/09/13

数据

df1 <- structure(list(ID = c(84L, 84L, 92L, 92L, 92L, 108L, 111L, 114L, 
130L, 130L), start = c("27/03/09", "23/02/13", "20/04/12", "18/07/14", 
"5/12/15", "12/12/14", "31/01/14", "10/04/13", "5/01/11", "15/05/12"
), end = c("24/03/13", "21/07/17", "17/08/14", "4/01/16", "25/01/17", 
"25/08/17", "18/11/16", "15/07/13", "4/03/12", "27/09/13")), .Names = c("ID", 
"start", "end"), class = "data.frame", row.names = c(NA, -10L
 ))