用其他列中最后观察到的值填充NA,并通过添加一些常量进行修改

时间:2019-04-25 16:09:07

标签: r dplyr time-series

我有一些过程的开始,结束和过程的持续时间。

        process_start            process_end    hourly_process_duration
  2019-01-01 00:00:00    2019-01-01 12:00:00                         12
  2019-01-01 12:00:00    2019-01-01 13:00:00                          1
                   NA                     NA                         11
                   NA                     NA                         15 
  2019-01-02 15:00:00    2019-01-02 18:00:00                          3

我一直有hourly_process_duration。流程是连续的-当一个流程结束时,下一个流程开始。

我需要正确替换NA。像示例中一样:

        process_start            process_end    hourly_process_duration
  2019-01-01 00:00:00    2019-01-01 12:00:00                         12
  2019-01-01 12:00:00    2019-01-01 13:00:00                          1
  2019-01-01 13:00:00    2019-01-02 00:00:00                         11
  2019-01-02 00:00:00    2019-01-02 15:00:00                         15 
  2019-01-02 15:00:00    2019-01-02 18:00:00                          3

1 个答案:

答案 0 :(得分:1)

这是填补缺少的日期时间的一种选择

library(dplyr)
library(lubridate)
df1 %>%
   mutate(process_start = coalesce(process_start, lag(process_end)), 
          process_end = coalesce(process_end, lead(process_start))) %>% 
   mutate_at(vars(process_start, process_end), ymd_hms) %>% 
   mutate_at(vars(process_start, process_end), 
     list(~ replace(., is.na(.), floor_date(.[which(is.na(.))+1], "day"))))
#        process_start         process_end hourly_process_duration
#1 2019-01-01 00:00:00 2019-01-01 12:00:00                      12
#2 2019-01-01 12:00:00 2019-01-01 13:00:00                       1
#3 2019-01-01 13:00:00 2019-01-02 00:00:00                      11
#4 2019-01-02 00:00:00 2019-01-02 15:00:00                      15
#5 2019-01-02 15:00:00 2019-01-02 18:00:00                       3

数据

df1 <- structure(list(process_start = c("2019-01-01 00:00:00", 
    "2019-01-01 12:00:00", 
NA, NA, "2019-01-02 15:00:00"), process_end = c("2019-01-01 12:00:00", 
"2019-01-01 13:00:00", NA, NA, "2019-01-02 18:00:00"), 
hourly_process_duration = c(12L, 
1L, 11L, 15L, 3L)), class = "data.frame", row.names = c(NA, -5L
))