对于每小时数据,获取每天的最大价值

时间:2016-05-14 12:18:52

标签: r

让我有以下数据:

time <- seq(ISOdate(2007,7,1,0), ISOdate(2008,4,5,23), by = "1 hour")
y <- rnorm(n = length(time))

year  <- as.numeric(substr((as.character(time)), 1, 4))  # year number as numeric

month <- as.numeric(substr((as.character(time)), 6, 7))  # month number as numeric

day <- as.numeric(substr((as.character(time)), 9, 10))  # day number as numeric

hour <- as.numeric(substr((as.character(time)), 12, 13))  # hour number as numeric

dat <- data.frame(year=year, month=month, day=day, hour=hour, y = y)

对于每一天,每小时有24 y个值(0到23)。现在我必须找到每天y的最大值。也就是说,对于日期&#34; 2007-10-05&#34;每小时(0到23)获得24个y值,我必须获得当天的最大值&#34; 2007-10-05&#34;。因此,在2007-07-01&#34;之间有279天。到&#34; 2008-04-05&#34;,我将获得279个最大y值。

我该怎么做?

3 个答案:

答案 0 :(得分:3)

使用dplyr

library(dplyr)
dyp1 <- dat %>% 
        group_by(year, month, day) %>% 
        summarise(y=max(y))

使用data.table

library(data.table)
setDT(dat)[, .(y=max(y)), by = .(year, month, day)]

使用基础R

aggregate( y ~ year+month+day, dat, max)

答案 1 :(得分:2)

使用sqldf

library(sqldf)
sqldf("select year, month, day, 
       max(y) as y 
       from dat 
       group by year, month, day") 

或另一种选择是订购'y'并选择第一个值

library(data.table)
setDT(dat)[order(-y), .(y= y[1L]), by = .(year, month, day)]

dplyr

library(dplyr)
dat %>%
    group_by(year, month, day) %>%
    arrange(desc(y)) %>%
    summarise(y = first(y))  

答案 2 :(得分:1)

将cut命令直接应用于time和y数组:

tapply(y, INDEX =cut(time, breaks="day"), max)

或使用dplyr库:

library(dplyr)
df<-data.frame(time, y)
summarize(group_by(df, cut(df$time, breaks="day")), max(y))