我使用库(dplys)时遇到问题。我需要计算每个重复ID的时间间隔。当我运行下面的代码时,虽然看起来是正确的,但该函数不计算Id = R3的间隔。我想知道如何解决这个问题。
ID<-c("R1","R2","R2","R3","R3","R4","R4","R4","R4","R3","R3","R3","R3","R2","R2","R2","R5","R6")
START<-c("3-4-2013","4-5-2018","4-5-2015","4-6-2011","5-5-2012","1-9-2010","23-4-1999","25-6-2011","3-6-2011","4-5-2014",
"6-6-2016","5-7-2014","7-7-1990","3-3-1998","4-4-1990","7-8-2014","22-4-1970","23-5-1984")
End<-c("3-4-2014","4-5-2019","5-5-2015","4-6-2013","5-5-2014","1-9-2012","23-4-2010","25-6-2015","3-6-2013","6-5-2014",
"6-8-2016","5-8-2014","7-9-1990","3-7-1998","4-9-1990","7-12-2014","22-7-1970","23-8-1984")
event<-c("a","b","b","s","s","f","f","b","b","a","a","a","s","c","c","b","m","a")
df<-data.frame(ID,START,End,event)
library(dplyr)
df<-data.frame(ID,START,End,event, stringsAsFactors = FALSE)
df$START <- as.Date(df$START, format = '%d-%m-%Y')
df$End <- as.Date(df$End, format = '%d-%m-%Y')
df %>% arrange(ID, START, End) %>% group_by(ID) %>% mutate(laggedTimeElapsed = difftime(START, lag(End), units = 'days'))
结果:
ID START End event laggedTimeElapsed
(chr) (date) (date) (chr) (dfft)
1 R1 2013-04-03 2014-04-03 a NA days
2 R2 1990-04-04 1990-09-04 c NA days
3 R2 1998-03-03 1998-07-03 c 2737 days
4 R2 2014-08-07 2014-12-07 b 5879 days
5 R2 2015-05-04 2015-05-05 b 148 days
6 R2 2018-05-04 2019-05-04 b 1095 days
7 R3 1990-07-07 1990-09-07 s NA days
8 R3 2011-06-04 2013-06-04 s NA days
9 R3 2012-05-05 2014-05-05 s NA days
10 R3 2014-05-04 2014-05-06 a NA days
11 R3 2014-07-05 2014-08-05 a NA days
12 R3 2016-06-06 2016-08-06 a NA days
13 R4 1999-04-23 2010-04-23 f NA days
14 R4 2010-09-01 2012-09-01 f 131 days
15 R4 2011-06-03 2013-06-03 b -456 days
16 R4 2011-06-25 2015-06-25 b -709 days
17 R5 1970-04-22 1970-07-22 m NA days
18 R6 1984-05-23 1984-08-23 a NA days
答案 0 :(得分:0)
我们可以使用data.table
library(data.table)
setDT(df)[order(ID, START, End),laggedTimeElapsed:= difftime(START,
shift(End), units='days') , ID]