我有一个像这样的数据框
ID <- c("111","111","112","112",
"113","113","114","114",
"115","116")
ACTION <- c("UA Created","UA Complete","UA Created","UA Complete",
"UA Created","UA Expired","UA Created","UA Expired",
"UA Created","UA Created")
Datetime <- c("2018-04-15 12:44:11","2018-04-17 12:44:11","2018-04-18 19:07:11","2018-04-19 21:11:09",
"2018-04-23 22:24:11","2018-04-23 22:44:11","2018-04-25 17:07:11","2018-05-05 21:11:09",
"2018-04-22 21:11:09", "2018-04-26 21:11:09")
STATUS <- c(NA,"Done",NA,"Done",
NA,NA,NA,NA,
NA,NA)
df <- data.frame(ID,ACTION,Datetime,STATUS)
df$Datetime <- as.POSIXct(df$Datetime,format="%Y-%m-%d %H:%M:%S")
我正在尝试在按ID分组的2个日期时间之间创建一个名为“DURATION_DAYS”的列。我只想返回具有ACTION ='UA Complete'或'UA Expired'的行,以获取相同的ID以及计算的持续时间。
我想要的输出是
ID ACTION Datetime STATUS DURATION_DAYS
111 UA Complete 2018-04-17 12:44:11 Done 2.00000000
112 UA Complete 2018-04-19 21:11:09 Done 1.08608796
113 UA Expired 2018-04-23 22:44:11 NA 0.01388889
114 UA Expired 2018-05-05 21:11:09 NA 10.16942130
115 UA Created 2018-04-22 21:11:09 NA NA
116 UA Created 2018-04-26 21:11:09 NA NA
我尝试使用dplyr这样做,但不知何故错过了逻辑
library(dplyr)
library(lubridate)
df1 <- df %>%
group_by(ID) %>%
mutate(DURATION_DAYS = as.numeric(difftime(dmy_hm(Datetime),
dmy_hm(Datetime)[1], units = 'days')))
答案 0 :(得分:3)
你非常接近解决方案。您不需要使用ymd_hm
,因为Datetime已经是POSIXct
类型。此外,您需要使用min
和max
来获取ID的时间差异。
library(dplyr)
library(lubridate)
df %>%
group_by(ID) %>%
mutate(DURATION_DAYS = (difftime(max(Datetime),
min(Datetime), units = 'days'))) %>%
filter(ACTION %in% c("UA Complete", "UA Expired"))
# # A tibble: 4 x 5
# # Groups: ID [4]
# ID ACTION Datetime STATUS DURATION_DAYS
# <fctr> <fctr> <dttm> <fctr> <time>
# 1 111 UA Complete 2018-04-17 12:44:11 Done 2
# 2 112 UA Complete 2018-04-19 21:11:09 Done 1.08608796296296
# 3 113 UA Expired 2018-04-23 22:44:11 <NA> 0.0138888888888889
# 4 114 UA Expired 2018-05-05 21:11:09 <NA> 10.1694212962963