lubridate - 按小时分组值并计算平均值

时间:2016-11-09 23:39:17

标签: r dataframe lubridate

我有以下data.frame:

df = read.csv(text = 'date,      no,      no2,      nox,
              2015-10-16 00:00:00, 1.10979, 14.50249, 16.20413,
              2015-10-16 01:00:00, 1.73032, 13.60122, 16.25434,
              2015-10-17 00:00:00, 1.30592, 11.20056, 13.20294,
              2015-10-17 01:00:00, 2.05711, 11.34973, 14.50392,
              2015-10-18 00:00:00, 4.14603, 16.79844, 23.15559,
              2015-10-18 01:00:00, 7.73731, 24.74488, 36.60860')
df = df[,-c(5)]

我需要计算所有变量三天中每小时的平均值。

我尝试了这个,但它不起作用:

data_0 = df[hours(df$date) %in% 0,]
data_1 = df[hours(df$date) %in% 1,]

.....

有什么建议吗?

输出应该是一个数据框,对于每个变量,我有三天时间范围内每小时的平均值。

> class(df$date)
[1] "POSIXlt" "POSIXt" 

3 个答案:

答案 0 :(得分:0)

由于您的数据集未以可重现的格式提供,我使用的是库(openair)中的数据集。

library(data.table)

data(mydata, package = "openair")

melt(setDT(mydata), id.var = "date")[, .(
  avg = mean(value, na.rm = T)
), by = .(hour(date), variable)]

答案 1 :(得分:0)

{{1}}

对所有感兴趣的变量重复第2点和第3点。

答案 2 :(得分:0)

这是一个整洁的例子,这应该有效。这种重复方式非常简单。

library(lubridate)
library(tidyverse)

    df = read.csv(text = 'date,      no,      no2,      nox,
              2015-10-16 00:00:00, 1.10979, 14.50249, 16.20413,
              2015-10-16 01:00:00, 1.73032, 13.60122, 16.25434,
              2015-10-17 00:00:00, 1.30592, 11.20056, 13.20294,
              2015-10-17 01:00:00, 2.05711, 11.34973, 14.50392,
              2015-10-18 00:00:00, 4.14603, 16.79844, 23.15559,
              2015-10-18 01:00:00, 7.73731, 24.74488, 36.60860')
df = df[,-c(5)]

df %>% 
  mutate(date = ymd_hms(date),
         hour = hour(date)) %>% 
  group_by(hour) %>% 
  summarise(mean_no = mean(no),
            mean_no2 = mean(no2))