下面我有数据框,想首先找到一个特殊事件的位置,并计算该特殊事件(警报)与下一个连续两个项目之间的时间差
id date type
2 2015-03-01 a
3 2015-12-12 b
2 2015-01-05 Alert
2 2015-01-15 c
2 2015-01-01 d
2 2015-12-02 a
3 2015-12-02 Alert
3 2015-12-02 a
4 2015-12-02 b
3 2015-12-12 a
...
及以下数据框是预期的:
id days_diffrence_1 days_difference_2
2 10 55
3 0 10
4 nan nan
...
我尝试了一下,但是效果不好,
table <- df %>%
group_by(id) %>%
summarise(days_diffrence_1 = as.numeric(date[2] - date[1]),
days_difference_2 = as.numeric(date[3] - date[1]))
答案 0 :(得分:0)
我们可以使用dplyr
使用group_by
,id
arrange
,date
,获取特殊出现的索引(“警告”)(如果存在)并减去该索引中的date
和接下来的两个索引中的{如果没有出现“警报”,我们将返回NA
。
library(dplyr)
df %>%
group_by(id) %>%
arrange(date) %>%
summarise(inds = if (any(type == "Alert")) which.max(type == "Alert") else NA,
days_diffrence_1 = as.numeric(date[inds+1] - date[inds]),
days_diffrence_2 = as.numeric(date[inds+2] - date[inds])) %>%
select(-inds)
# id days_diffrence_1 days_diffrence_2
# <int> <dbl> <dbl>
#1 2 10 55
#2 3 0 10
#3 4 NA NA
答案 1 :(得分:0)
另一种tidyverse / dplyr方法
样本数据
df <- read.table(text="id date type
2 2015-03-01 a
3 2015-12-12 b
2 2015-01-05 Alert
2 2015-01-15 c
2 2015-01-01 d
2 2015-12-02 a
3 2015-12-02 Alert
3 2015-12-02 a
4 2015-12-02 b
3 2015-12-12 a", header = TRUE, stringsAsFactor = FALSE)
代码
library( tidyverse )
df %>%
#set date as Date-class
mutate( date = as.Date( date ) ) %>%
arrange( date ) %>%
group_by( id ) %>%
#calculate days to next event
mutate( days_diffrence_1 = ifelse( type == "Alert", lead( date, n = 1L, order_by = id ) - date, NA ),
days_diffrence_2 = ifelse( type == "Alert", lead( date, n = 2L, order_by = id ) - date, NA ) ) %>%
filter( !is.na( days_diffrence_1 ) )
结果
# id date type days_diffrence_1 days_diffrence_2
# <int> <date> <chr> <dbl> <dbl>
# 1 2 2015-01-05 Alert 10 55
# 2 3 2015-12-02 Alert 0 10