检查当前 (R) 前几天的条件是否满足

时间:2021-01-21 13:51:46

标签: r date conditional-statements

我有一个按日期排列的数据库表格,每小时测量一次。如果此度量大于 5,则满足标准 1(我已经在 dplyr 中使用 mutate 做到了这一点) 像这样:

<头>
日期 小时 测量 Meets_criteria_1
01/01/2020 00:00:00 7 1
01/01/2020 01:00:00 12 1
01/01/2020 02:00:00 3 0
……
10/01/2020 21:00:00 2 0
10/01/2020 22:00:00 15 1
10/01/2020 23:00:00 20 1

现在,我想知道在一天内满足条件的次数。为此,我将按天分组的“Meets_criteria_1”中的所有“1”相加:

<头>
日期 小时 Meets_criteria_1 Sum_criteria_1
01/01/2020 00:00:00 1 3
01/01/2020 01:00:00 1 3
01/01/2020 02:00:00 0 3
……
10/01/2020 21:00:00 0 11
10/01/2020 22:00:00 1 11
10/01/2020 23:00:00 1 11

但是,我需要第三个条件来满足条件 2,即“在前两天,“Sum_criteria_1”至少为 6”

我的问题是: 我如何告诉 R 检查前两天到当前? 因此,例如,如果我查看 08/01/2020 日期,则仅当 sum_criteria_1 在 06/01/2020 和 07/01/2020 上至少为 6 时才满足条件。

编辑: 我尝试使用延迟,但我只是得到了 NA:

mutate(yesterday = lag(sum_criteria_1, n=1), 
       day_before = lag(sum_criteria_1, n = 2))

欢迎任何帮助! 谢谢

1 个答案:

答案 0 :(得分:1)

使用这样的东西

df %>% mutate(meets_criteria_1 = ifelse(Measure > 5, 1, 0)) %>%
  group_by(Date) %>%
  summarise(sum_criteria = sum(meets_criteria_1)) %>%
  mutate(criteria2 = ifelse(lag(sum_criteria) >= 6 & lag(sum_criteria,2), 1, 0)) %>%
  right_join(df, by = "Date")

示例数据

df <- structure(list(Date = c("01-01-2020", "01-01-2020", "01-01-2020", 
"01-01-2020", "01-01-2020", "01-01-2020", "01-01-2020", "01-01-2020", 
"01-01-2020", "01-01-2020", "01-01-2020", "01-01-2020", "01-01-2020", 
"01-01-2020", "01-01-2020", "01-01-2020", "01-01-2020", "01-01-2020", 
"01-01-2020", "01-01-2020", "01-01-2020", "01-01-2020", "01-01-2020", 
"01-01-2020", "02-01-2020", "02-01-2020", "02-01-2020", "02-01-2020", 
"02-01-2020", "02-01-2020", "02-01-2020", "02-01-2020", "02-01-2020", 
"02-01-2020", "02-01-2020", "02-01-2020", "02-01-2020", "02-01-2020", 
"02-01-2020", "02-01-2020", "02-01-2020", "02-01-2020", "02-01-2020", 
"02-01-2020", "02-01-2020", "02-01-2020", "02-01-2020", "02-01-2020", 
"03-01-2020", "03-01-2020", "03-01-2020", "03-01-2020", "03-01-2020", 
"03-01-2020", "03-01-2020", "03-01-2020", "03-01-2020", "03-01-2020", 
"03-01-2020", "03-01-2020", "03-01-2020", "03-01-2020", "03-01-2020", 
"03-01-2020", "03-01-2020", "03-01-2020", "03-01-2020", "03-01-2020", 
"03-01-2020", "03-01-2020", "03-01-2020", "03-01-2020", "04-01-2020", 
"04-01-2020", "04-01-2020", "04-01-2020", "04-01-2020", "04-01-2020", 
"04-01-2020", "04-01-2020", "04-01-2020", "04-01-2020", "04-01-2020", 
"04-01-2020", "04-01-2020", "04-01-2020", "04-01-2020", "04-01-2020", 
"04-01-2020", "04-01-2020", "04-01-2020", "04-01-2020", "04-01-2020", 
"04-01-2020", "04-01-2020", "04-01-2020", "05-01-2020", "05-01-2020", 
"05-01-2020", "05-01-2020", "05-01-2020", "05-01-2020", "05-01-2020", 
"05-01-2020", "05-01-2020", "05-01-2020", "05-01-2020", "05-01-2020", 
"05-01-2020", "05-01-2020", "05-01-2020", "05-01-2020", "05-01-2020", 
"05-01-2020", "05-01-2020", "05-01-2020", "05-01-2020", "05-01-2020", 
"05-01-2020", "05-01-2020"), Hour = c("00:00:00", "01:00:00", 
"02:00:00", "03:00:00", "04:00:00", "05:00:00", "06:00:00", "07:00:00", 
"08:00:00", "09:00:00", "10:00:00", "11:00:00", "12:00:00", "13:00:00", 
"14:00:00", "15:00:00", "16:00:00", "17:00:00", "18:00:00", "19:00:00", 
"20:00:00", "21:00:00", "22:00:00", "23:00:00", "00:00:00", "01:00:00", 
"02:00:00", "03:00:00", "04:00:00", "05:00:00", "06:00:00", "07:00:00", 
"08:00:00", "09:00:00", "10:00:00", "11:00:00", "12:00:00", "13:00:00", 
"14:00:00", "15:00:00", "16:00:00", "17:00:00", "18:00:00", "19:00:00", 
"20:00:00", "21:00:00", "22:00:00", "23:00:00", "00:00:00", "01:00:00", 
"02:00:00", "03:00:00", "04:00:00", "05:00:00", "06:00:00", "07:00:00", 
"08:00:00", "09:00:00", "10:00:00", "11:00:00", "12:00:00", "13:00:00", 
"14:00:00", "15:00:00", "16:00:00", "17:00:00", "18:00:00", "19:00:00", 
"20:00:00", "21:00:00", "22:00:00", "23:00:00", "00:00:00", "01:00:00", 
"02:00:00", "03:00:00", "04:00:00", "05:00:00", "06:00:00", "07:00:00", 
"08:00:00", "09:00:00", "10:00:00", "11:00:00", "12:00:00", "13:00:00", 
"14:00:00", "15:00:00", "16:00:00", "17:00:00", "18:00:00", "19:00:00", 
"20:00:00", "21:00:00", "22:00:00", "23:00:00", "00:00:00", "01:00:00", 
"02:00:00", "03:00:00", "04:00:00", "05:00:00", "06:00:00", "07:00:00", 
"08:00:00", "09:00:00", "10:00:00", "11:00:00", "12:00:00", "13:00:00", 
"14:00:00", "15:00:00", "16:00:00", "17:00:00", "18:00:00", "19:00:00", 
"20:00:00", "21:00:00", "22:00:00", "23:00:00"), Measure = c(2L, 
14L, 4L, 19L, 0L, 15L, 13L, 17L, 3L, 19L, 0L, 17L, 15L, 14L, 
8L, 7L, 13L, 14L, 4L, 18L, 18L, 14L, 8L, 1L, 3L, 12L, 18L, 7L, 
13L, 15L, 12L, 17L, 2L, 8L, 1L, 18L, 19L, 14L, 2L, 7L, 12L, 17L, 
14L, 20L, 1L, 15L, 18L, 1L, 12L, 5L, 0L, 20L, 19L, 10L, 7L, 5L, 
8L, 8L, 0L, 15L, 16L, 20L, 14L, 18L, 17L, 3L, 15L, 14L, 4L, 17L, 
16L, 11L, 12L, 10L, 7L, 0L, 15L, 3L, 12L, 17L, 6L, 4L, 16L, 4L, 
15L, 0L, 5L, 7L, 6L, 3L, 15L, 10L, 12L, 19L, 13L, 3L, 18L, 14L, 
11L, 18L, 15L, 17L, 19L, 1L, 18L, 16L, 14L, 2L, 3L, 2L, 16L, 
10L, 2L, 12L, 10L, 7L, 5L, 9L, 12L, 17L)), class = "data.frame", row.names = c(NA, 
-120L))

检查结果

# A tibble: 120 x 5
   Date       sum_criteria criteria2 Hour     Measure
   <chr>             <dbl>     <dbl> <chr>      <int>
 1 01-01-2020           17        NA 00:00:00       2
 2 01-01-2020           17        NA 01:00:00      14
 3 01-01-2020           17        NA 02:00:00       4
 4 01-01-2020           17        NA 03:00:00      19
 5 01-01-2020           17        NA 04:00:00       0
 6 01-01-2020           17        NA 05:00:00      15
 7 01-01-2020           17        NA 06:00:00      13
 8 01-01-2020           17        NA 07:00:00      17
 9 01-01-2020           17        NA 08:00:00       3
10 01-01-2020           17        NA 09:00:00      19
# ... with 110 more rows