创建一个持续时间列,其中包含按ID分组的日期时间差异

时间:2018-05-08 21:33:51

标签: r dataframe dplyr data.table

我有一个像这样的数据框

ID <- c("111","111","112","112",
        "113","113","114","114",
        "115","116")
ACTION <- c("UA Created","UA Complete","UA Created","UA Complete",
            "UA Created","UA Expired","UA Created","UA Expired",
            "UA Created","UA Created")
Datetime <- c("2018-04-15 12:44:11","2018-04-17 12:44:11","2018-04-18 19:07:11","2018-04-19 21:11:09",
              "2018-04-23 22:24:11","2018-04-23 22:44:11","2018-04-25 17:07:11","2018-05-05 21:11:09",
              "2018-04-22 21:11:09", "2018-04-26 21:11:09")
STATUS <- c(NA,"Done",NA,"Done",
            NA,NA,NA,NA,
            NA,NA)

df <- data.frame(ID,ACTION,Datetime,STATUS) 
df$Datetime <- as.POSIXct(df$Datetime,format="%Y-%m-%d %H:%M:%S")

我正在尝试在按ID分组的2个日期时间之间创建一个名为“DURATION_DAYS”的列。我只想返回具有ACTION ='UA Complete'或'UA Expired'的行,以获取相同的ID以及计算的持续时间。

我想要的输出是

   ID      ACTION            Datetime STATUS DURATION_DAYS
  111 UA Complete 2018-04-17 12:44:11   Done    2.00000000
  112 UA Complete 2018-04-19 21:11:09   Done    1.08608796
  113  UA Expired 2018-04-23 22:44:11     NA    0.01388889
  114  UA Expired 2018-05-05 21:11:09     NA   10.16942130
  115  UA Created 2018-04-22 21:11:09     NA            NA
  116  UA Created 2018-04-26 21:11:09     NA            NA

我尝试使用dplyr这样做,但不知何故错过了逻辑

    library(dplyr)
    library(lubridate)
        df1 <- df %>% 
        group_by(ID) %>%
        mutate(DURATION_DAYS = as.numeric(difftime(dmy_hm(Datetime), 
                                           dmy_hm(Datetime)[1], units = 'days')))

1 个答案:

答案 0 :(得分:3)

你非常接近解决方案。您不需要使用ymd_hm,因为Datetime已经是POSIXct类型。此外,您需要使用minmax来获取ID的时间差异。

library(dplyr)
library(lubridate)
df %>% 
  group_by(ID) %>%
  mutate(DURATION_DAYS = (difftime(max(Datetime), 
                                  min(Datetime), units = 'days'))) %>%
  filter(ACTION %in% c("UA Complete", "UA Expired"))

# # A tibble: 4 x 5
# # Groups: ID [4]
# ID     ACTION      Datetime            STATUS DURATION_DAYS     
# <fctr> <fctr>      <dttm>              <fctr> <time>            
# 1 111    UA Complete 2018-04-17 12:44:11 Done   2                 
# 2 112    UA Complete 2018-04-19 21:11:09 Done   1.08608796296296  
# 3 113    UA Expired  2018-04-23 22:44:11 <NA>   0.0138888888888889
# 4 114    UA Expired  2018-05-05 21:11:09 <NA>   10.1694212962963