计算特殊事件与下一个连续两个日期之间的时间

时间:2018-10-10 07:24:06

标签: r dataframe

下面我有数据框,想首先找到一个特殊事件的位置,并计算该特殊事件(警报)与下一个连续两个项目之间的时间差

id     date         type
2      2015-03-01   a
3      2015-12-12   b
2      2015-01-05   Alert
2      2015-01-15   c
2      2015-01-01   d
2      2015-12-02   a
3      2015-12-02   Alert
3      2015-12-02   a
4      2015-12-02   b
3      2015-12-12   a
...

及以下数据框是预期的:

id     days_diffrence_1     days_difference_2
2      10                   55
3      0                    10
4      nan                  nan
... 

我尝试了一下,但是效果不好,

  table <- df %>% 
  group_by(id) %>%
  summarise(days_diffrence_1 = as.numeric(date[2] - date[1]),
            days_difference_2 = as.numeric(date[3] - date[1]))

2 个答案:

答案 0 :(得分:0)

我们可以使用dplyr使用group_byid arrangedate,获取特殊出现的索引(“警告”)(如果存在)并减去该索引中的date和接下来的两个索引中的{如果没有出现“警报”,我们将返回NA

library(dplyr)

df %>%
   group_by(id) %>%
   arrange(date) %>%
   summarise(inds = if (any(type == "Alert")) which.max(type == "Alert") else NA,
        days_diffrence_1 = as.numeric(date[inds+1] - date[inds]), 
        days_diffrence_2 = as.numeric(date[inds+2] - date[inds])) %>%
   select(-inds)


#     id days_diffrence_1 days_diffrence_2
#  <int>            <dbl>            <dbl>
#1     2               10               55
#2     3                0               10
#3     4               NA               NA

答案 1 :(得分:0)

另一种tidyverse / dplyr方法

样本数据

df <- read.table(text="id     date         type
2      2015-03-01   a
3      2015-12-12   b
2      2015-01-05   Alert
2      2015-01-15   c
2      2015-01-01   d
2      2015-12-02   a
3      2015-12-02   Alert
3      2015-12-02   a
4      2015-12-02   b
3      2015-12-12   a", header = TRUE, stringsAsFactor = FALSE)

代码

library( tidyverse )

df %>% 
  #set date as Date-class
  mutate( date = as.Date( date ) ) %>%
  arrange( date ) %>%
  group_by( id ) %>%
  #calculate days to next event 
  mutate( days_diffrence_1 = ifelse( type == "Alert", lead( date, n = 1L, order_by = id ) - date, NA ),
          days_diffrence_2 = ifelse( type == "Alert", lead( date, n = 2L, order_by = id ) - date, NA ) ) %>%
  filter( !is.na( days_diffrence_1 ) )

结果

#      id date       type  days_diffrence_1 days_diffrence_2
#   <int> <date>     <chr>            <dbl>            <dbl>
# 1     2 2015-01-05 Alert               10               55
# 2     3 2015-12-02 Alert                0               10