让我有以下数据:
time <- seq(ISOdate(2007,7,1,0), ISOdate(2008,4,5,23), by = "1 hour")
y <- rnorm(n = length(time))
year <- as.numeric(substr((as.character(time)), 1, 4)) # year number as numeric
month <- as.numeric(substr((as.character(time)), 6, 7)) # month number as numeric
day <- as.numeric(substr((as.character(time)), 9, 10)) # day number as numeric
hour <- as.numeric(substr((as.character(time)), 12, 13)) # hour number as numeric
dat <- data.frame(year=year, month=month, day=day, hour=hour, y = y)
对于每一天,每小时有24 y
个值(0到23)。现在我必须找到每天y
的最大值。也就是说,对于日期&#34; 2007-10-05&#34;每小时(0到23)获得24个y
值,我必须获得当天的最大值&#34; 2007-10-05&#34;。因此,在2007-07-01&#34;之间有279天。到&#34; 2008-04-05&#34;,我将获得279个最大y
值。
我该怎么做?
答案 0 :(得分:3)
使用dplyr
,
library(dplyr)
dyp1 <- dat %>%
group_by(year, month, day) %>%
summarise(y=max(y))
使用data.table
,
library(data.table)
setDT(dat)[, .(y=max(y)), by = .(year, month, day)]
使用基础R
,
aggregate( y ~ year+month+day, dat, max)
答案 1 :(得分:2)
使用sqldf
library(sqldf)
sqldf("select year, month, day,
max(y) as y
from dat
group by year, month, day")
或另一种选择是订购'y'并选择第一个值
library(data.table)
setDT(dat)[order(-y), .(y= y[1L]), by = .(year, month, day)]
或dplyr
library(dplyr)
dat %>%
group_by(year, month, day) %>%
arrange(desc(y)) %>%
summarise(y = first(y))
答案 2 :(得分:1)
将cut命令直接应用于time和y数组:
tapply(y, INDEX =cut(time, breaks="day"), max)
或使用dplyr库:
library(dplyr)
df<-data.frame(time, y)
summarize(group_by(df, cut(df$time, breaks="day")), max(y))