修复有关使用库(dplys)_R的问题

时间:2016-01-06 18:13:56

标签: r algorithm

我使用库(dplys)时遇到问题。我需要计算每个重复ID的时间间隔。当我运行下面的代码时,虽然看起来是正确的,但该函数不计算Id = R3的间隔。我想知道如何解决这个问题。

ID<-c("R1","R2","R2","R3","R3","R4","R4","R4","R4","R3","R3","R3","R3","R2","R2","R2","R5","R6")
START<-c("3-4-2013","4-5-2018","4-5-2015","4-6-2011","5-5-2012","1-9-2010","23-4-1999","25-6-2011","3-6-2011","4-5-2014",
    "6-6-2016","5-7-2014","7-7-1990","3-3-1998","4-4-1990","7-8-2014","22-4-1970","23-5-1984")
End<-c("3-4-2014","4-5-2019","5-5-2015","4-6-2013","5-5-2014","1-9-2012","23-4-2010","25-6-2015","3-6-2013","6-5-2014",
    "6-8-2016","5-8-2014","7-9-1990","3-7-1998","4-9-1990","7-12-2014","22-7-1970","23-8-1984")
event<-c("a","b","b","s","s","f","f","b","b","a","a","a","s","c","c","b","m","a")
df<-data.frame(ID,START,End,event)

library(dplyr)
df<-data.frame(ID,START,End,event, stringsAsFactors = FALSE)
df$START <- as.Date(df$START, format = '%d-%m-%Y')
df$End <- as.Date(df$End, format = '%d-%m-%Y')
df %>% arrange(ID, START, End) %>% group_by(ID) %>% mutate(laggedTimeElapsed = difftime(START, lag(End), units = 'days'))

结果:

 ID      START        End event laggedTimeElapsed
   (chr)     (date)     (date) (chr)            (dfft)
1     R1 2013-04-03 2014-04-03     a           NA days
2     R2 1990-04-04 1990-09-04     c           NA days
3     R2 1998-03-03 1998-07-03     c         2737 days
4     R2 2014-08-07 2014-12-07     b         5879 days
5     R2 2015-05-04 2015-05-05     b          148 days
6     R2 2018-05-04 2019-05-04     b         1095 days
7     R3 1990-07-07 1990-09-07     s           NA days
8     R3 2011-06-04 2013-06-04     s           NA days
9     R3 2012-05-05 2014-05-05     s           NA days
10    R3 2014-05-04 2014-05-06     a           NA days
11    R3 2014-07-05 2014-08-05     a           NA days
12    R3 2016-06-06 2016-08-06     a           NA days
13    R4 1999-04-23 2010-04-23     f           NA days
14    R4 2010-09-01 2012-09-01     f          131 days
15    R4 2011-06-03 2013-06-03     b         -456 days
16    R4 2011-06-25 2015-06-25     b         -709 days
17    R5 1970-04-22 1970-07-22     m           NA days
18    R6 1984-05-23 1984-08-23     a           NA days

1 个答案:

答案 0 :(得分:0)

我们可以使用data.table

library(data.table)
setDT(df)[order(ID, START, End),laggedTimeElapsed:=  difftime(START,
                               shift(End), units='days') , ID]