我有一个数据集:
here_dat <- '
ID,Event,Date
1,Pre-trans,01-01-2018
1,Event1 start,09-01-2018
1,Trans,19-01-2018
1,Trans,09-01-2018
1,Event1 end,19-01-2018
1,Post-trans,20-01-2018
1,Event2 start,21-01-2018
1,Trans,22-01-2018
1,Trans,23-01-2018
2,Pre-trans,01-01-2018
2,Event1 start,07-01-2018
3,Pre-trans,01-01-2018
3,Event2 start,09-01-2018
3,Trans,11-01-2018
3,Trans,13-01-2018
3,Trans,14-01-2018
3,Trans,17-01-2018
3,Event2 end,19-01-2018
3,Event1 start,25-01-2018
3,Event1 end,27-02-2018
'
events <- read.table(text=here_dat, sep=",", header=TRUE, stringsAsFactors=FALSE)
我想计算每个ID的每种事件类型之间的持续时间。如果事件没有结束日期,则使用当前日期作为结束日期。
理想输出:
ID Event.type Event.startDate Duration
1 Event1 09-01-2018 10
1 Event2 21-01-2018 138
2 Event1 07-01-2018 152
3 Event2 09-01-2018 10
3 Event1 25-01-2018 2
答案 0 :(得分:1)
你可以尝试
library(tidyverse)
events %>%
as.tibble() %>%
mutate(Date=as.Date(Date, format="%d-%m-%Y")) %>%
separate(Event, letters[1:2], sep=" ") %>%
filter(grepl("Event", a)) %>%
spread(b, Date) %>%
mutate(Duration=ifelse(is.na(end), Sys.Date() - start, end- start))
# A tibble: 5 x 5
ID a end start Duration
<int> <chr> <date> <date> <dbl>
1 1 Event1 2018-01-19 2018-01-09 10
2 1 Event2 NA 2018-01-21 138
3 2 Event1 NA 2018-01-07 152
4 3 Event1 2018-02-27 2018-01-25 33
5 3 Event2 2018-01-19 2018-01-09 10
我们的想法是将开始和结束分开以轻松减去两个日期。在这里,我们可以使用tidyverse
函数来转换Date
和Event
列。然后separate
到filter
获取“Envents”,最后spread
。