时间间隔计算R中重复的ID

时间:2016-01-06 10:36:36

标签: r time intervals

我有一个大型数据集。

  1. 我想创建一个列,以显示每个重复ID的开始日期和结束日期(从上一行开始)之间的天数。 例如,对于R1,由于它没有重复,我不会计算间隔。 对于R2,首先,我需要根据开始日期对其进行排序。然后我计算第二个最早开始日期和上一行结束日期之间的天数。接下来,我将继续计算从第二个最早开始日期起第三个最早开始日期和结束日期之间的天数,依此类推。我也想为任何其他重复的ID做这件事。

  2. 然后我想创建一个新列,以与第一部分相同的方式计算天数,用于具有相同事件级别的重复ID。 我想知道如何做到这一点。

  3. ID<-c("R1","R2","R2","R3","R3","R4","R4","R4","R4","R3","R3","R3","R3","R2","R2","R2","R5","R6")
    START<-c("3-4-2013","4-5-2018","4-5-2015","4-6-2011","5-5-2012","1-9-2010","23-4-1999","25-6-2011","3-6-2011","4-5-2014",
        "6-6-2016","5-7-2014","7-7-1990","3-3-1998","4-4-1990","7-8-2014","22-4-1970","23-5-1984")
    End<-c("3-4-2014","4-5-2019","5-5-2015","4-6-2013","5-5-2014","1-9-2012","23-4-2010","25-6-2015","3-6-2013","6-5-2014",
        "6-8-2016","5-8-2014","7-9-1990","3-7-1998","4-9-1990","7-12-2014","22-7-1970","23-8-1984")
    event<-c("a","b","b","s","s","f","f","b","b","a","a","a","s","c","c","b","m","a")
    df<-data.frame(ID,START,End,event)
    

    所以结果会是这样的:

    ID     START       End     event   Time1                     Time2
    1  R1  3-4-2013  3-4-2014     a     NA                        NA
    14 R2  3-3-1998  3-7-1998     c     NA                        NA
    15 R2  4-4-1990  4-9-1990     c    (4-4-1990)-(3-7-1998)   (4-4-1990)-(3-7-1998)
    3  R2  4-5-2015  5-5-2015     b    (4-5-2015)-(4-9-1990)      NA
    2  R2  4-5-2018  4-5-2019     b    (4-5-2018)-(5-5-2015)   (4-5-2018)-(5-5-2015)
    16 R2  7-8-2014 7-12-2014     b    (7-8-2014)-(4-5-2019)   (7-8-2014)-(4-5-2019)
    10 R3  4-5-2014  6-5-2014     a     NA                        NA
    4  R3  4-6-2011  4-6-2013     s    (4-6-2011)-(6-5-2014)      NA
    5  R3  5-5-2012  5-5-2014     s    (5-5-2012)-(4-6-2013)   (5-5-2012)-(4-6-2013)                    
    12 R3  5-7-2014  5-8-2014     a    (5-7-2014)-(5-5-2014)   (5-7-2014)-(6-5-2014)
    11 R3  6-6-2016  6-8-2016     a    (6-6-2016)-(5-8-2014)   (6-6-2016)-(5-8-2014)
    13 R3  7-7-1990  7-9-1990     s                            (7-7-1990)-(5-5-2014)
    6  R4  1-9-2010  1-9-2012     f
    7  R4 23-4-1999 23-4-2010     f
    8  R4 25-6-2011 25-6-2015     b
    9  R4  3-6-2011  3-6-2013     b
    17 R5 22-4-1970 22-7-1970     m
    18 R6 23-5-1984 23-8-1984     a
    > 
    

1 个答案:

答案 0 :(得分:1)

实现此目的的一种方法是使用dplyr包,如下所示(修复数据框后如下所示):

library(dplyr)
df<-data.frame(ID,START,End,event, stringsAsFactors = FALSE)
df$START <- as.Date(df$START, format = '%d-%m-%Y')
df$End <- as.Date(df$End, format = '%d-%m-%Y')
df %>% arrange(ID, START, End) %>% group_by(ID) %>% mutate(laggedTimeElapsed = difftime(START, lag(End), units = 'days'))

不确定上面#2中你想要的是什么,但是,如果你想在给定的行中创建'事件持续时间',你只需执行以下操作:

df %>% arrange(ID, START, End) %>% group_by(ID) %>% mutate(laggedTimeElapsed = difftime(START, lag(End), units = 'days'), eventDuration = difftime(End, START, units = 'days'))

此处输出:

Source: local data frame [18 x 6]
Groups: ID [6]

      ID      START        End event laggedTimeElapsed eventDuration
   (chr)     (date)     (date) (chr)            (dfft)        (dfft)
1     R1 2013-04-03 2014-04-03     a           NA days      365 days
2     R2 1990-04-04 1990-09-04     c           NA days      153 days
3     R2 1998-03-03 1998-07-03     c         2737 days      122 days
4     R2 2014-08-07 2014-12-07     b         5879 days      122 days
5     R2 2015-05-04 2015-05-05     b          148 days        1 days
6     R2 2018-05-04 2019-05-04     b         1095 days      365 days
7     R3 1990-07-07 1990-09-07     s           NA days       62 days
8     R3 2011-06-04 2013-06-04     s         7575 days      731 days
9     R3 2012-05-05 2014-05-05     s         -395 days      730 days
10    R3 2014-05-04 2014-05-06     a           -1 days        2 days
11    R3 2014-07-05 2014-08-05     a           60 days       31 days
12    R3 2016-06-06 2016-08-06     a          671 days       61 days
13    R4 1999-04-23 2010-04-23     f           NA days     4018 days
14    R4 2010-09-01 2012-09-01     f          131 days      731 days
15    R4 2011-06-03 2013-06-03     b         -456 days      731 days
16    R4 2011-06-25 2015-06-25     b         -709 days     1461 days
17    R5 1970-04-22 1970-07-22     m           NA days       91 days
18    R6 1984-05-23 1984-08-23     a           NA days       92 days