我有一个数据库,其中包含一系列机器的错误寄存器及其对应日期。有几种错误。即:
fechayhora id tipo
1: 2017-03-21 11:03:00 A2_LR1_Z1 APF
2: 2017-05-03 10:34:00 A2_LR1_Z1 APF
3: 2017-05-17 08:52:00 A2_LR1_Z1 APF
4: 2017-05-17 10:46:00 A2_LR1_Z1 APF
5: 2017-05-17 14:23:00 A2_LR1_Z1 APF
6: 2017-05-17 17:29:00 A2_LR1_Z1 APF
我想添加一个包含事件总和的列,其中包括" APF"在之前发生过,比如说12个小时(实际上我可以改变一个参数)。
预期结果:
fechayhora id tipo number_of_APF_12h
1: 2017-03-21 11:03:00 A2_LR1_Z1 APF 0
2: 2017-05-03 10:34:00 A2_LR1_Z1 APF 0
3: 2017-05-17 08:52:00 A2_LR1_Z1 APF 0
4: 2017-05-17 10:46:00 A2_LR1_Z1 APF 1
5: 2017-05-17 14:23:00 A2_LR1_Z1 APF 2
6: 2017-05-17 17:29:00 A2_LR1_Z1 APF 3
答案 0 :(得分:2)
这是一个使用purrr::map2_dbl()
的解决方案。您可以将小时数更改为您想要的任何小时数。
suppressPackageStartupMessages(library(tibble))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(purrr))
suppressPackageStartupMessages(library(lubridate))
# Example data
df <- tribble(
~fechayhora, ~id, ~tipo,
"2017-03-21 11:03:00", "A2_LR1_Z1", "APF",
"2017-05-03 10:34:00", "A2_LR1_Z1", "APF",
"2017-05-17 08:52:00", "A2_LR1_Z1", "APF",
"2017-05-17 10:46:00", "A2_LR1_Z1", "APF",
"2017-05-17 14:23:00", "A2_LR1_Z1", "APF",
"2017-05-17 17:29:00", "A2_LR1_Z1", "APF"
)
# Convert fechayhora to date and add a column of the time difference
df <- df %>%
mutate(fechayhora = as.POSIXct(fechayhora),
minus_12 = fechayhora - hours(12))
# Map over fechayhora and minus_12
# For each (fechayhora, minus_12) pair, find all the dates between them
# and sum the logical vector that is returned
df <- df %>% mutate(
number_of_APF_12h = map2_dbl(.x = fechayhora,
.y = minus_12,
.f = ~sum(between(df$fechayhora, .y, .x)) - 1))
df %>%
select(fechayhora, number_of_APF_12h)
#> # A tibble: 6 x 2
#> fechayhora number_of_APF_12h
#> <dttm> <dbl>
#> 1 2017-03-21 11:03:00 0
#> 2 2017-05-03 10:34:00 0
#> 3 2017-05-17 08:52:00 0
#> 4 2017-05-17 10:46:00 1
#> 5 2017-05-17 14:23:00 2
#> 6 2017-05-17 17:29:00 3