在r中填写缺失的间隔值

时间:2018-08-23 02:28:16

标签: r dplyr tidyverse tidyr lubridate

我有一个包含4个变量的数据,其中2个是日期变量。我想检查带有TYPE == “OT”TYPE == “NON-OT”的行的间隔是否落在带有TYPE == “ICU”的上一行的间隔内。

数据:

df <- structure(list(id = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1), TYPE = c("NON-OT", "NON-OT", "OT", "ICU", "OT",
"NON-OT", "OT", "NON-OT", "ICU", "OT", "OT", "ICU", "OT", "OT",
"NON-OT", "OT", "NON-OT"), DATE1 = structure(c(1427214540, 1427216280,
1427279700, 1427370420, 1427543700, 1427564520, 1427800800, 1427849280,
1427850240, 1427927400, 1428155400, 1428166380, 1428514500, 1428927000,
1429167600, 1429264500, 1429388160), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), DATE2 = structure(c(1427216280, 1427370420,
1427279700, 1427564520, 1427543700, 1427849280, 1427800800, 1427850240,
1428166380, 1427927400, 1428155400, 1429388160, 1428514500, 1428927000,
1429167600, 1429264500, 1430362020), class = c("POSIXct", "POSIXt"
), tzone = "UTC")), .Names = c("id", "TYPE", "DATE1", "DATE2"
), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-17L))

#    id   TYPE               DATE1               DATE2
# 1   1 NON-OT 2015-03-24 16:29:00 2015-03-24 16:58:00
# 2   1 NON-OT 2015-03-24 16:58:00 2015-03-26 11:47:00
# 3   1     OT 2015-03-25 10:35:00 2015-03-25 10:35:00
# 4   1    ICU 2015-03-26 11:47:00 2015-03-28 17:42:00
# 5   1     OT 2015-03-28 11:55:00 2015-03-28 11:55:00
# 6   1 NON-OT 2015-03-28 17:42:00 2015-04-01 00:48:00
# 7   1     OT 2015-03-31 11:20:00 2015-03-31 11:20:00
# 8   1 NON-OT 2015-04-01 00:48:00 2015-04-01 01:04:00
# 9   1    ICU 2015-04-01 01:04:00 2015-04-04 16:53:00
# 10  1     OT 2015-04-01 22:30:00 2015-04-01 22:30:00
# 11  1     OT 2015-04-04 13:50:00 2015-04-04 13:50:00
# 12  1    ICU 2015-04-04 16:53:00 2015-04-18 20:16:00
# 13  1     OT 2015-04-08 17:35:00 2015-04-08 17:35:00
# 14  1     OT 2015-04-13 12:10:00 2015-04-13 12:10:00
# 15  1 NON-OT 2015-04-16 07:00:00 2015-04-16 07:00:00
# 16  1     OT 2015-04-17 09:55:00 2015-04-17 09:55:00
# 17  1 NON-OT 2015-04-18 20:16:00 2015-04-30 02:47:00

这就是我所做的:

  1. 获取一个新变量INT,该变量为每一行提供DATE1DATE2之间的间隔。
  2. 获取另一个变量INT_ICU,该变量仅给出TYPE == “ICU”的行的间隔并填写(这是问题所在,因为fill中的tidyr函数可以不填写缺少的间隔值。)
  3. 获取一个逻辑变量WITHIN_ICU,如果该间隔在ICU的间隔之内,则返回TRUE,否则为FALSE。

代码:

library(tidyverse)
df %>%
  mutate(INT = interval(DATE1, DATE2),
         INT_ICU = if_else(TYPE == "ICU", interval(DATE1, DATE2), NA_real_)) %>%
  fill(INT_ICU) %>%
  mutate(WITHIN_ICU = INT %within% INT_ICU)

输出: 如您所见,即使我应用了INT_ICU函数,fill变量中仍有许多缺失值。

#      id   TYPE               DATE1               DATE2                                              INT                                          INT_ICU WITHIN_ICU
#   <dbl>  <chr>              <dttm>              <dttm>                                   <S4: Interval>                                   <S4: Interval>      <lgl>
# 1     1 NON-OT 2015-03-24 16:29:00 2015-03-24 16:58:00 2015-03-24 16:29:00 UTC--2015-03-24 16:58:00 UTC                                           NA--NA         NA
# 2     1 NON-OT 2015-03-24 16:58:00 2015-03-26 11:47:00 2015-03-24 16:58:00 UTC--2015-03-26 11:47:00 UTC                                           NA--NA         NA
# 3     1     OT 2015-03-25 10:35:00 2015-03-25 10:35:00 2015-03-25 10:35:00 UTC--2015-03-25 10:35:00 UTC                                           NA--NA         NA
# 4     1    ICU 2015-03-26 11:47:00 2015-03-28 17:42:00 2015-03-26 11:47:00 UTC--2015-03-28 17:42:00 UTC 2015-03-26 11:47:00 UTC--2015-03-28 17:42:00 UTC       TRUE
# 5     1     OT 2015-03-28 11:55:00 2015-03-28 11:55:00 2015-03-28 11:55:00 UTC--2015-03-28 11:55:00 UTC                                           NA--NA         NA
# 6     1 NON-OT 2015-03-28 17:42:00 2015-04-01 00:48:00 2015-03-28 17:42:00 UTC--2015-04-01 00:48:00 UTC                                           NA--NA         NA
# 7     1     OT 2015-03-31 11:20:00 2015-03-31 11:20:00 2015-03-31 11:20:00 UTC--2015-03-31 11:20:00 UTC                                           NA--NA         NA
# 8     1 NON-OT 2015-04-01 00:48:00 2015-04-01 01:04:00 2015-04-01 00:48:00 UTC--2015-04-01 01:04:00 UTC                                           NA--NA         NA
# 9     1    ICU 2015-04-01 01:04:00 2015-04-04 16:53:00 2015-04-01 01:04:00 UTC--2015-04-04 16:53:00 UTC 2015-04-01 01:04:00 UTC--2015-04-04 16:53:00 UTC       TRUE
# 10    1     OT 2015-04-01 22:30:00 2015-04-01 22:30:00 2015-04-01 22:30:00 UTC--2015-04-01 22:30:00 UTC                                           NA--NA         NA
# 11    1     OT 2015-04-04 13:50:00 2015-04-04 13:50:00 2015-04-04 13:50:00 UTC--2015-04-04 13:50:00 UTC                                           NA--NA         NA
# 12    1    ICU 2015-04-04 16:53:00 2015-04-18 20:16:00 2015-04-04 16:53:00 UTC--2015-04-18 20:16:00 UTC 2015-04-04 16:53:00 UTC--2015-04-18 20:16:00 UTC       TRUE
# 13    1     OT 2015-04-08 17:35:00 2015-04-08 17:35:00 2015-04-08 17:35:00 UTC--2015-04-08 17:35:00 UTC                                           NA--NA         NA
# 14    1     OT 2015-04-13 12:10:00 2015-04-13 12:10:00 2015-04-13 12:10:00 UTC--2015-04-13 12:10:00 UTC                                           NA--NA         NA
# 15    1 NON-OT 2015-04-16 07:00:00 2015-04-16 07:00:00 2015-04-16 07:00:00 UTC--2015-04-16 07:00:00 UTC                                           NA--NA         NA
# 16    1     OT 2015-04-17 09:55:00 2015-04-17 09:55:00 2015-04-17 09:55:00 UTC--2015-04-17 09:55:00 UTC                                           NA--NA         NA
# 17    1 NON-OT 2015-04-18 20:16:00 2015-04-30 02:47:00 2015-04-18 20:16:00 UTC--2015-04-30 02:47:00 UTC                                           NA--NA         NA

所需的输出:

#      id   TYPE               DATE1               DATE2 WITHIN_ICU
#   <dbl>  <chr>              <dttm>              <dttm>      <lgl>
# 1     1 NON-OT 2015-03-24 16:29:00 2015-03-24 16:58:00         NA
# 2     1 NON-OT 2015-03-24 16:58:00 2015-03-26 11:47:00         NA
# 3     1     OT 2015-03-25 10:35:00 2015-03-25 10:35:00         NA
# 4     1    ICU 2015-03-26 11:47:00 2015-03-28 17:42:00       TRUE
# 5     1     OT 2015-03-28 11:55:00 2015-03-28 11:55:00       TRUE
# 6     1 NON-OT 2015-03-28 17:42:00 2015-04-01 00:48:00       FALSE
# 7     1     OT 2015-03-31 11:20:00 2015-03-31 11:20:00       FALSE
# 8     1 NON-OT 2015-04-01 00:48:00 2015-04-01 01:04:00       FALSE
# 9     1    ICU 2015-04-01 01:04:00 2015-04-04 16:53:00       TRUE
# 10    1     OT 2015-04-01 22:30:00 2015-04-01 22:30:00       TRUE
# 11    1     OT 2015-04-04 13:50:00 2015-04-04 13:50:00       TRUE
# 12    1    ICU 2015-04-04 16:53:00 2015-04-18 20:16:00       TRUE
# 13    1     OT 2015-04-08 17:35:00 2015-04-08 17:35:00       TRUE
# 14    1     OT 2015-04-13 12:10:00 2015-04-13 12:10:00       TRUE
# 15    1 NON-OT 2015-04-16 07:00:00 2015-04-16 07:00:00       TRUE
# 16    1     OT 2015-04-17 09:55:00 2015-04-17 09:55:00       TRUE
# 17    1 NON-OT 2015-04-18 20:16:00 2015-04-30 02:47:00       FALSE

1 个答案:

答案 0 :(得分:0)

这应该有效

# use own function to fill rather than using dplyr's fill
  f2 <- function(x) {
    for(i in seq_along(x)[-1]) if(is.na(x@start[i])) x[i] <- x[i-1]#check if Start in S4 interval object is NA. 
    x
  }

df %>%
  mutate(INT = interval(DATE1, DATE2),
         INT_ICU = if_else(TYPE == "ICU", interval(DATE1, DATE2), NA_real_)) %>% 
  mutate(INT_ICU = f2(t$INT_ICU)) %>% #instead of fill 
  mutate(WITHIN_ICU = INT %within% INT_ICU)

输出:

# A tibble: 17 x 6
      id TYPE   DATE1               DATE2               INT_ICU                                          WITHIN_ICU
   <dbl> <chr>  <dttm>              <dttm>              <S4: Interval>                                   <lgl>     
 1    1. NON-OT 2015-03-24 16:29:00 2015-03-24 16:58:00 NA--NA                                           NA        
 2    1. NON-OT 2015-03-24 16:58:00 2015-03-26 11:47:00 NA--NA                                           NA        
 3    1. OT     2015-03-25 10:35:00 2015-03-25 10:35:00 NA--NA                                           NA        
 4    1. ICU    2015-03-26 11:47:00 2015-03-28 17:42:00 2015-03-26 11:47:00 UTC--2015-03-28 17:42:00 UTC TRUE      
 5    1. OT     2015-03-28 11:55:00 2015-03-28 11:55:00 2015-03-26 11:47:00 UTC--2015-03-28 17:42:00 UTC TRUE      
 6    1. NON-OT 2015-03-28 17:42:00 2015-04-01 00:48:00 2015-03-26 11:47:00 UTC--2015-03-28 17:42:00 UTC FALSE     
 7    1. OT     2015-03-31 11:20:00 2015-03-31 11:20:00 2015-03-26 11:47:00 UTC--2015-03-28 17:42:00 UTC FALSE     
 8    1. NON-OT 2015-04-01 00:48:00 2015-04-01 01:04:00 2015-03-26 11:47:00 UTC--2015-03-28 17:42:00 UTC FALSE     
 9    1. ICU    2015-04-01 01:04:00 2015-04-04 16:53:00 2015-04-01 01:04:00 UTC--2015-04-04 16:53:00 UTC TRUE      
10    1. OT     2015-04-01 22:30:00 2015-04-01 22:30:00 2015-04-01 01:04:00 UTC--2015-04-04 16:53:00 UTC TRUE      
11    1. OT     2015-04-04 13:50:00 2015-04-04 13:50:00 2015-04-01 01:04:00 UTC--2015-04-04 16:53:00 UTC TRUE      
12    1. ICU    2015-04-04 16:53:00 2015-04-18 20:16:00 2015-04-04 16:53:00 UTC--2015-04-18 20:16:00 UTC TRUE      
13    1. OT     2015-04-08 17:35:00 2015-04-08 17:35:00 2015-04-04 16:53:00 UTC--2015-04-18 20:16:00 UTC TRUE      
14    1. OT     2015-04-13 12:10:00 2015-04-13 12:10:00 2015-04-04 16:53:00 UTC--2015-04-18 20:16:00 UTC TRUE      
15    1. NON-OT 2015-04-16 07:00:00 2015-04-16 07:00:00 2015-04-04 16:53:00 UTC--2015-04-18 20:16:00 UTC TRUE      
16    1. OT     2015-04-17 09:55:00 2015-04-17 09:55:00 2015-04-04 16:53:00 UTC--2015-04-18 20:16:00 UTC TRUE      
17    1. NON-OT 2015-04-18 20:16:00 2015-04-30 02:47:00 2015-04-04 16:53:00 UTC--2015-04-18 20:16:00 UTC FALSE