我有一个包含4个变量的数据,其中2个是日期变量。我想检查带有TYPE == “OT”
或TYPE == “NON-OT”
的行的间隔是否落在带有TYPE == “ICU”
的上一行的间隔内。
数据:
df <- structure(list(id = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1), TYPE = c("NON-OT", "NON-OT", "OT", "ICU", "OT",
"NON-OT", "OT", "NON-OT", "ICU", "OT", "OT", "ICU", "OT", "OT",
"NON-OT", "OT", "NON-OT"), DATE1 = structure(c(1427214540, 1427216280,
1427279700, 1427370420, 1427543700, 1427564520, 1427800800, 1427849280,
1427850240, 1427927400, 1428155400, 1428166380, 1428514500, 1428927000,
1429167600, 1429264500, 1429388160), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), DATE2 = structure(c(1427216280, 1427370420,
1427279700, 1427564520, 1427543700, 1427849280, 1427800800, 1427850240,
1428166380, 1427927400, 1428155400, 1429388160, 1428514500, 1428927000,
1429167600, 1429264500, 1430362020), class = c("POSIXct", "POSIXt"
), tzone = "UTC")), .Names = c("id", "TYPE", "DATE1", "DATE2"
), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-17L))
# id TYPE DATE1 DATE2
# 1 1 NON-OT 2015-03-24 16:29:00 2015-03-24 16:58:00
# 2 1 NON-OT 2015-03-24 16:58:00 2015-03-26 11:47:00
# 3 1 OT 2015-03-25 10:35:00 2015-03-25 10:35:00
# 4 1 ICU 2015-03-26 11:47:00 2015-03-28 17:42:00
# 5 1 OT 2015-03-28 11:55:00 2015-03-28 11:55:00
# 6 1 NON-OT 2015-03-28 17:42:00 2015-04-01 00:48:00
# 7 1 OT 2015-03-31 11:20:00 2015-03-31 11:20:00
# 8 1 NON-OT 2015-04-01 00:48:00 2015-04-01 01:04:00
# 9 1 ICU 2015-04-01 01:04:00 2015-04-04 16:53:00
# 10 1 OT 2015-04-01 22:30:00 2015-04-01 22:30:00
# 11 1 OT 2015-04-04 13:50:00 2015-04-04 13:50:00
# 12 1 ICU 2015-04-04 16:53:00 2015-04-18 20:16:00
# 13 1 OT 2015-04-08 17:35:00 2015-04-08 17:35:00
# 14 1 OT 2015-04-13 12:10:00 2015-04-13 12:10:00
# 15 1 NON-OT 2015-04-16 07:00:00 2015-04-16 07:00:00
# 16 1 OT 2015-04-17 09:55:00 2015-04-17 09:55:00
# 17 1 NON-OT 2015-04-18 20:16:00 2015-04-30 02:47:00
这就是我所做的:
INT
,该变量为每一行提供DATE1
和DATE2
之间的间隔。INT_ICU
,该变量仅给出TYPE == “ICU”
的行的间隔并填写(这是问题所在,因为fill
中的tidyr
函数可以不填写缺少的间隔值。)WITHIN_ICU
,如果该间隔在ICU的间隔之内,则返回TRUE,否则为FALSE。代码:
library(tidyverse)
df %>%
mutate(INT = interval(DATE1, DATE2),
INT_ICU = if_else(TYPE == "ICU", interval(DATE1, DATE2), NA_real_)) %>%
fill(INT_ICU) %>%
mutate(WITHIN_ICU = INT %within% INT_ICU)
输出:
如您所见,即使我应用了INT_ICU
函数,fill
变量中仍有许多缺失值。
# id TYPE DATE1 DATE2 INT INT_ICU WITHIN_ICU
# <dbl> <chr> <dttm> <dttm> <S4: Interval> <S4: Interval> <lgl>
# 1 1 NON-OT 2015-03-24 16:29:00 2015-03-24 16:58:00 2015-03-24 16:29:00 UTC--2015-03-24 16:58:00 UTC NA--NA NA
# 2 1 NON-OT 2015-03-24 16:58:00 2015-03-26 11:47:00 2015-03-24 16:58:00 UTC--2015-03-26 11:47:00 UTC NA--NA NA
# 3 1 OT 2015-03-25 10:35:00 2015-03-25 10:35:00 2015-03-25 10:35:00 UTC--2015-03-25 10:35:00 UTC NA--NA NA
# 4 1 ICU 2015-03-26 11:47:00 2015-03-28 17:42:00 2015-03-26 11:47:00 UTC--2015-03-28 17:42:00 UTC 2015-03-26 11:47:00 UTC--2015-03-28 17:42:00 UTC TRUE
# 5 1 OT 2015-03-28 11:55:00 2015-03-28 11:55:00 2015-03-28 11:55:00 UTC--2015-03-28 11:55:00 UTC NA--NA NA
# 6 1 NON-OT 2015-03-28 17:42:00 2015-04-01 00:48:00 2015-03-28 17:42:00 UTC--2015-04-01 00:48:00 UTC NA--NA NA
# 7 1 OT 2015-03-31 11:20:00 2015-03-31 11:20:00 2015-03-31 11:20:00 UTC--2015-03-31 11:20:00 UTC NA--NA NA
# 8 1 NON-OT 2015-04-01 00:48:00 2015-04-01 01:04:00 2015-04-01 00:48:00 UTC--2015-04-01 01:04:00 UTC NA--NA NA
# 9 1 ICU 2015-04-01 01:04:00 2015-04-04 16:53:00 2015-04-01 01:04:00 UTC--2015-04-04 16:53:00 UTC 2015-04-01 01:04:00 UTC--2015-04-04 16:53:00 UTC TRUE
# 10 1 OT 2015-04-01 22:30:00 2015-04-01 22:30:00 2015-04-01 22:30:00 UTC--2015-04-01 22:30:00 UTC NA--NA NA
# 11 1 OT 2015-04-04 13:50:00 2015-04-04 13:50:00 2015-04-04 13:50:00 UTC--2015-04-04 13:50:00 UTC NA--NA NA
# 12 1 ICU 2015-04-04 16:53:00 2015-04-18 20:16:00 2015-04-04 16:53:00 UTC--2015-04-18 20:16:00 UTC 2015-04-04 16:53:00 UTC--2015-04-18 20:16:00 UTC TRUE
# 13 1 OT 2015-04-08 17:35:00 2015-04-08 17:35:00 2015-04-08 17:35:00 UTC--2015-04-08 17:35:00 UTC NA--NA NA
# 14 1 OT 2015-04-13 12:10:00 2015-04-13 12:10:00 2015-04-13 12:10:00 UTC--2015-04-13 12:10:00 UTC NA--NA NA
# 15 1 NON-OT 2015-04-16 07:00:00 2015-04-16 07:00:00 2015-04-16 07:00:00 UTC--2015-04-16 07:00:00 UTC NA--NA NA
# 16 1 OT 2015-04-17 09:55:00 2015-04-17 09:55:00 2015-04-17 09:55:00 UTC--2015-04-17 09:55:00 UTC NA--NA NA
# 17 1 NON-OT 2015-04-18 20:16:00 2015-04-30 02:47:00 2015-04-18 20:16:00 UTC--2015-04-30 02:47:00 UTC NA--NA NA
所需的输出:
# id TYPE DATE1 DATE2 WITHIN_ICU
# <dbl> <chr> <dttm> <dttm> <lgl>
# 1 1 NON-OT 2015-03-24 16:29:00 2015-03-24 16:58:00 NA
# 2 1 NON-OT 2015-03-24 16:58:00 2015-03-26 11:47:00 NA
# 3 1 OT 2015-03-25 10:35:00 2015-03-25 10:35:00 NA
# 4 1 ICU 2015-03-26 11:47:00 2015-03-28 17:42:00 TRUE
# 5 1 OT 2015-03-28 11:55:00 2015-03-28 11:55:00 TRUE
# 6 1 NON-OT 2015-03-28 17:42:00 2015-04-01 00:48:00 FALSE
# 7 1 OT 2015-03-31 11:20:00 2015-03-31 11:20:00 FALSE
# 8 1 NON-OT 2015-04-01 00:48:00 2015-04-01 01:04:00 FALSE
# 9 1 ICU 2015-04-01 01:04:00 2015-04-04 16:53:00 TRUE
# 10 1 OT 2015-04-01 22:30:00 2015-04-01 22:30:00 TRUE
# 11 1 OT 2015-04-04 13:50:00 2015-04-04 13:50:00 TRUE
# 12 1 ICU 2015-04-04 16:53:00 2015-04-18 20:16:00 TRUE
# 13 1 OT 2015-04-08 17:35:00 2015-04-08 17:35:00 TRUE
# 14 1 OT 2015-04-13 12:10:00 2015-04-13 12:10:00 TRUE
# 15 1 NON-OT 2015-04-16 07:00:00 2015-04-16 07:00:00 TRUE
# 16 1 OT 2015-04-17 09:55:00 2015-04-17 09:55:00 TRUE
# 17 1 NON-OT 2015-04-18 20:16:00 2015-04-30 02:47:00 FALSE
答案 0 :(得分:0)
这应该有效
# use own function to fill rather than using dplyr's fill
f2 <- function(x) {
for(i in seq_along(x)[-1]) if(is.na(x@start[i])) x[i] <- x[i-1]#check if Start in S4 interval object is NA.
x
}
df %>%
mutate(INT = interval(DATE1, DATE2),
INT_ICU = if_else(TYPE == "ICU", interval(DATE1, DATE2), NA_real_)) %>%
mutate(INT_ICU = f2(t$INT_ICU)) %>% #instead of fill
mutate(WITHIN_ICU = INT %within% INT_ICU)
输出:
# A tibble: 17 x 6
id TYPE DATE1 DATE2 INT_ICU WITHIN_ICU
<dbl> <chr> <dttm> <dttm> <S4: Interval> <lgl>
1 1. NON-OT 2015-03-24 16:29:00 2015-03-24 16:58:00 NA--NA NA
2 1. NON-OT 2015-03-24 16:58:00 2015-03-26 11:47:00 NA--NA NA
3 1. OT 2015-03-25 10:35:00 2015-03-25 10:35:00 NA--NA NA
4 1. ICU 2015-03-26 11:47:00 2015-03-28 17:42:00 2015-03-26 11:47:00 UTC--2015-03-28 17:42:00 UTC TRUE
5 1. OT 2015-03-28 11:55:00 2015-03-28 11:55:00 2015-03-26 11:47:00 UTC--2015-03-28 17:42:00 UTC TRUE
6 1. NON-OT 2015-03-28 17:42:00 2015-04-01 00:48:00 2015-03-26 11:47:00 UTC--2015-03-28 17:42:00 UTC FALSE
7 1. OT 2015-03-31 11:20:00 2015-03-31 11:20:00 2015-03-26 11:47:00 UTC--2015-03-28 17:42:00 UTC FALSE
8 1. NON-OT 2015-04-01 00:48:00 2015-04-01 01:04:00 2015-03-26 11:47:00 UTC--2015-03-28 17:42:00 UTC FALSE
9 1. ICU 2015-04-01 01:04:00 2015-04-04 16:53:00 2015-04-01 01:04:00 UTC--2015-04-04 16:53:00 UTC TRUE
10 1. OT 2015-04-01 22:30:00 2015-04-01 22:30:00 2015-04-01 01:04:00 UTC--2015-04-04 16:53:00 UTC TRUE
11 1. OT 2015-04-04 13:50:00 2015-04-04 13:50:00 2015-04-01 01:04:00 UTC--2015-04-04 16:53:00 UTC TRUE
12 1. ICU 2015-04-04 16:53:00 2015-04-18 20:16:00 2015-04-04 16:53:00 UTC--2015-04-18 20:16:00 UTC TRUE
13 1. OT 2015-04-08 17:35:00 2015-04-08 17:35:00 2015-04-04 16:53:00 UTC--2015-04-18 20:16:00 UTC TRUE
14 1. OT 2015-04-13 12:10:00 2015-04-13 12:10:00 2015-04-04 16:53:00 UTC--2015-04-18 20:16:00 UTC TRUE
15 1. NON-OT 2015-04-16 07:00:00 2015-04-16 07:00:00 2015-04-04 16:53:00 UTC--2015-04-18 20:16:00 UTC TRUE
16 1. OT 2015-04-17 09:55:00 2015-04-17 09:55:00 2015-04-04 16:53:00 UTC--2015-04-18 20:16:00 UTC TRUE
17 1. NON-OT 2015-04-18 20:16:00 2015-04-30 02:47:00 2015-04-04 16:53:00 UTC--2015-04-18 20:16:00 UTC FALSE