每个时间段连续n分钟的最低均值和标准偏差

时间:2019-03-16 17:50:37

标签: r

新的R用户在这里寻求指导。我正在处理一个15分钟的数据集,并希望通过一年中的每一天来分析以下变量(以我的情况为例): (1)连续n行的“值”的最低均值(最好为2或3小时) (2)同期的标准差

样本df:

    variable    year month day hr min   date     value
    building_a  2018    6   2   0   0   6/2/2018    19
    building_a  2018    6   2   0   15  6/2/2018    19
    building_a  2018    6   2   0   30  6/2/2018    19
    building_a  2018    6   2   0   45  6/2/2018    17
    building_a  2018    6   2   1   0   6/2/2018    17
    building_a  2018    6   2   1   15  6/2/2018    15
    building_a  2018    6   2   1   30  6/2/2018    15
    building_a  2018    6   2   1   45  6/2/2018    14
    building_a  2018    6   2   2   0   6/2/2018    14
    building_a  2018    6   2   2   15  6/2/2018    13
    building_a  2018    6   2   2   30  6/2/2018    13
    building_a  2018    6   2   2   45  6/2/2018    13
    building_a  2018    6   2   3   0   6/2/2018    12
    building_a  2018    6   2   3   15  6/2/2018    14
    building_a  2018    6   2   3   30  6/2/2018    13
    building_a  2018    6   2   3   45  6/2/2018    13
    building_b  2018    6   2   0   0   6/2/2018    37
    building_b  2018    6   2   0   15  6/2/2018    41
    building_b  2018    6   2   0   30  6/2/2018    38
    building_b  2018    6   2   0   45  6/2/2018    39
    building_b  2018    6   2   1   0   6/2/2018    37
    building_b  2018    6   2   1   15  6/2/2018    36
    building_b  2018    6   2   1   30  6/2/2018    34
    building_b  2018    6   2   1   45  6/2/2018    34
    building_b  2018    6   2   2   0   6/2/2018    35
    building_b  2018    6   2   2   15  6/2/2018    35
    building_b  2018    6   2   2   30  6/2/2018    29
    building_b  2018    6   2   2   45  6/2/2018    32
    building_b  2018    6   2   3   0   6/2/2018    30
    building_b  2018    6   2   3   15  6/2/2018    33
    building_b  2018    6   2   3   30  6/2/2018    30
    building_b  2018    6   2   3   45  6/2/2018    32

我已经可以使用以下方法在一个小时的时段内执行此操作,但无法弄清楚如何使其适应更大的窗口(例如,最低135分钟的平均值而不是60分钟)。

    tmp <- aggregate(value~variable+date+hour, df, 
                               function(x) 
                                   c(mean = mean(x), sd = sd(x)))

    tmp2 <- do.call("data.frame",tmp)
    tmp2$value.mean <- as.numeric(tmp2$value.mean)
    tmp2$value.sd <- as.numeric(tmp2$value.sd)

    tmp2_flat <- tmp2 %>%
      group_by(variable, date) %>%
      filter(value.mean == min(value.mean)) %>%
      arrange(variable, date, value.sd) %>%
      slice(1)

预先感谢您的任何建议

1 个答案:

答案 0 :(得分:0)

我玩了一些,这就是我想出的:

更新:最后一个答案不太可行。没有反馈,但我还是在更改它。

library(zoo)
library(dplyr)

df %>%
  group_by(variable, date) %>%
  mutate(minimum =  rollapply(value, width = 4, FUN = mean, fill = NA, align = "right"),
         sd = rollapply(value, width = 4, FUN = sd, fill = NA, align = "right")) %>%
  slice(which.min(minimum))

# A tibble: 2 x 10
# Groups:   variable, date [2]
  variable    year month   day    hr   min date     value minimum    sd
  <fct>      <int> <int> <int> <int> <int> <fct>    <int>   <dbl> <dbl>
1 building_a  2018     6     2     3     0 6/2/2018    12    12.8  0.5 
2 building_b  2018     6     2     2    30 6/2/2018    29    33.2  2.87

但是,想法仍然相同。在rollapply()函数中,可以通过n参数指定连续行的width=4在这种情况下是指4 * 15 minutes = 1 hour,但可以是任意数量的刻钟小时。 通过回溯value行,它计算出每一行width的“移动平均值”。

我希望这样做。