我有一个按日期排列的数据库表格,每小时测量一次。如果此度量大于 5,则满足标准 1(我已经在 dplyr 中使用 mutate 做到了这一点) 像这样:
日期 | 小时 | 测量 | Meets_criteria_1 |
---|---|---|---|
01/01/2020 | 00:00:00 | 7 | 1 |
01/01/2020 | 01:00:00 | 12 | 1 |
01/01/2020 | 02:00:00 | 3 | 0 |
…… | |||
10/01/2020 | 21:00:00 | 2 | 0 |
10/01/2020 | 22:00:00 | 15 | 1 |
10/01/2020 | 23:00:00 | 20 | 1 |
现在,我想知道在一天内满足条件的次数。为此,我将按天分组的“Meets_criteria_1”中的所有“1”相加:
日期 | 小时 | Meets_criteria_1 | Sum_criteria_1 |
---|---|---|---|
01/01/2020 | 00:00:00 | 1 | 3 |
01/01/2020 | 01:00:00 | 1 | 3 |
01/01/2020 | 02:00:00 | 0 | 3 |
…… | |||
10/01/2020 | 21:00:00 | 0 | 11 |
10/01/2020 | 22:00:00 | 1 | 11 |
10/01/2020 | 23:00:00 | 1 | 11 |
但是,我需要第三个条件来满足条件 2,即“在前两天,“Sum_criteria_1”至少为 6”
我的问题是: 我如何告诉 R 检查前两天到当前? 因此,例如,如果我查看 08/01/2020 日期,则仅当 sum_criteria_1 在 06/01/2020 和 07/01/2020 上至少为 6 时才满足条件。
编辑: 我尝试使用延迟,但我只是得到了 NA:
mutate(yesterday = lag(sum_criteria_1, n=1),
day_before = lag(sum_criteria_1, n = 2))
欢迎任何帮助! 谢谢
答案 0 :(得分:1)
使用这样的东西
df %>% mutate(meets_criteria_1 = ifelse(Measure > 5, 1, 0)) %>%
group_by(Date) %>%
summarise(sum_criteria = sum(meets_criteria_1)) %>%
mutate(criteria2 = ifelse(lag(sum_criteria) >= 6 & lag(sum_criteria,2), 1, 0)) %>%
right_join(df, by = "Date")
df <- structure(list(Date = c("01-01-2020", "01-01-2020", "01-01-2020",
"01-01-2020", "01-01-2020", "01-01-2020", "01-01-2020", "01-01-2020",
"01-01-2020", "01-01-2020", "01-01-2020", "01-01-2020", "01-01-2020",
"01-01-2020", "01-01-2020", "01-01-2020", "01-01-2020", "01-01-2020",
"01-01-2020", "01-01-2020", "01-01-2020", "01-01-2020", "01-01-2020",
"01-01-2020", "02-01-2020", "02-01-2020", "02-01-2020", "02-01-2020",
"02-01-2020", "02-01-2020", "02-01-2020", "02-01-2020", "02-01-2020",
"02-01-2020", "02-01-2020", "02-01-2020", "02-01-2020", "02-01-2020",
"02-01-2020", "02-01-2020", "02-01-2020", "02-01-2020", "02-01-2020",
"02-01-2020", "02-01-2020", "02-01-2020", "02-01-2020", "02-01-2020",
"03-01-2020", "03-01-2020", "03-01-2020", "03-01-2020", "03-01-2020",
"03-01-2020", "03-01-2020", "03-01-2020", "03-01-2020", "03-01-2020",
"03-01-2020", "03-01-2020", "03-01-2020", "03-01-2020", "03-01-2020",
"03-01-2020", "03-01-2020", "03-01-2020", "03-01-2020", "03-01-2020",
"03-01-2020", "03-01-2020", "03-01-2020", "03-01-2020", "04-01-2020",
"04-01-2020", "04-01-2020", "04-01-2020", "04-01-2020", "04-01-2020",
"04-01-2020", "04-01-2020", "04-01-2020", "04-01-2020", "04-01-2020",
"04-01-2020", "04-01-2020", "04-01-2020", "04-01-2020", "04-01-2020",
"04-01-2020", "04-01-2020", "04-01-2020", "04-01-2020", "04-01-2020",
"04-01-2020", "04-01-2020", "04-01-2020", "05-01-2020", "05-01-2020",
"05-01-2020", "05-01-2020", "05-01-2020", "05-01-2020", "05-01-2020",
"05-01-2020", "05-01-2020", "05-01-2020", "05-01-2020", "05-01-2020",
"05-01-2020", "05-01-2020", "05-01-2020", "05-01-2020", "05-01-2020",
"05-01-2020", "05-01-2020", "05-01-2020", "05-01-2020", "05-01-2020",
"05-01-2020", "05-01-2020"), Hour = c("00:00:00", "01:00:00",
"02:00:00", "03:00:00", "04:00:00", "05:00:00", "06:00:00", "07:00:00",
"08:00:00", "09:00:00", "10:00:00", "11:00:00", "12:00:00", "13:00:00",
"14:00:00", "15:00:00", "16:00:00", "17:00:00", "18:00:00", "19:00:00",
"20:00:00", "21:00:00", "22:00:00", "23:00:00", "00:00:00", "01:00:00",
"02:00:00", "03:00:00", "04:00:00", "05:00:00", "06:00:00", "07:00:00",
"08:00:00", "09:00:00", "10:00:00", "11:00:00", "12:00:00", "13:00:00",
"14:00:00", "15:00:00", "16:00:00", "17:00:00", "18:00:00", "19:00:00",
"20:00:00", "21:00:00", "22:00:00", "23:00:00", "00:00:00", "01:00:00",
"02:00:00", "03:00:00", "04:00:00", "05:00:00", "06:00:00", "07:00:00",
"08:00:00", "09:00:00", "10:00:00", "11:00:00", "12:00:00", "13:00:00",
"14:00:00", "15:00:00", "16:00:00", "17:00:00", "18:00:00", "19:00:00",
"20:00:00", "21:00:00", "22:00:00", "23:00:00", "00:00:00", "01:00:00",
"02:00:00", "03:00:00", "04:00:00", "05:00:00", "06:00:00", "07:00:00",
"08:00:00", "09:00:00", "10:00:00", "11:00:00", "12:00:00", "13:00:00",
"14:00:00", "15:00:00", "16:00:00", "17:00:00", "18:00:00", "19:00:00",
"20:00:00", "21:00:00", "22:00:00", "23:00:00", "00:00:00", "01:00:00",
"02:00:00", "03:00:00", "04:00:00", "05:00:00", "06:00:00", "07:00:00",
"08:00:00", "09:00:00", "10:00:00", "11:00:00", "12:00:00", "13:00:00",
"14:00:00", "15:00:00", "16:00:00", "17:00:00", "18:00:00", "19:00:00",
"20:00:00", "21:00:00", "22:00:00", "23:00:00"), Measure = c(2L,
14L, 4L, 19L, 0L, 15L, 13L, 17L, 3L, 19L, 0L, 17L, 15L, 14L,
8L, 7L, 13L, 14L, 4L, 18L, 18L, 14L, 8L, 1L, 3L, 12L, 18L, 7L,
13L, 15L, 12L, 17L, 2L, 8L, 1L, 18L, 19L, 14L, 2L, 7L, 12L, 17L,
14L, 20L, 1L, 15L, 18L, 1L, 12L, 5L, 0L, 20L, 19L, 10L, 7L, 5L,
8L, 8L, 0L, 15L, 16L, 20L, 14L, 18L, 17L, 3L, 15L, 14L, 4L, 17L,
16L, 11L, 12L, 10L, 7L, 0L, 15L, 3L, 12L, 17L, 6L, 4L, 16L, 4L,
15L, 0L, 5L, 7L, 6L, 3L, 15L, 10L, 12L, 19L, 13L, 3L, 18L, 14L,
11L, 18L, 15L, 17L, 19L, 1L, 18L, 16L, 14L, 2L, 3L, 2L, 16L,
10L, 2L, 12L, 10L, 7L, 5L, 9L, 12L, 17L)), class = "data.frame", row.names = c(NA,
-120L))
# A tibble: 120 x 5
Date sum_criteria criteria2 Hour Measure
<chr> <dbl> <dbl> <chr> <int>
1 01-01-2020 17 NA 00:00:00 2
2 01-01-2020 17 NA 01:00:00 14
3 01-01-2020 17 NA 02:00:00 4
4 01-01-2020 17 NA 03:00:00 19
5 01-01-2020 17 NA 04:00:00 0
6 01-01-2020 17 NA 05:00:00 15
7 01-01-2020 17 NA 06:00:00 13
8 01-01-2020 17 NA 07:00:00 17
9 01-01-2020 17 NA 08:00:00 3
10 01-01-2020 17 NA 09:00:00 19
# ... with 110 more rows