我有一个数据框,其中一列中有很长的日期列表,另一列中有值,如下所示:
set.seed(1234)
df <- data.frame(date= as.Date(c('2010-09-05', '2011-09-06', '2010-09-13',
'2011-09-14', '2010-09-23', '2011-09-24',
'2010-10-05', '2011-10-06', '2010-10-13',
'2011-10-14', '2010-10-23', '2011-10-24')),
value= rnorm(12))
我需要计算每个月每10天的平均值,但无论年份如何,都是这样:
dfNeeded <- data.frame(datePeriod=c('period.Sept0.10', 'period.Sept11.20', 'period.Sept21.30',
'period.Oct0.10', 'period.Oct11.20', 'period.Oct21.31'),
meanValue=c(mean(df$value[c(1,2)]),
mean(df$value[c(3,4)]),
mean(df$value[c(5,6)]),
mean(df$value[c(7,8)]),
mean(df$value[c(9,10)]),
mean(df$value[c(11,12)])))
有这么快的方法吗?
答案 0 :(得分:5)
这是一种方法,它使用lubridate
包进行月和日提取,但您可以使用基本R日期函数来执行此操作:
library(lubridate)
df$period <- paste(month(df$date),cut(day(df$date),breaks=c(0,10,20,31)),sep="-")
aggregate(df$value, list(period=df$period), mean)
给出了:
period x
1 10-(0,10] -0.5606859
2 10-(10,20] -0.7272449
3 10-(20,31] -0.7377896
4 9-(0,10] -0.4648183
5 9-(10,20] -0.6306283
6 9-(20,31] 0.4675903
答案 1 :(得分:2)
使用format.Date和modulo算法的这种方法应该相当快:
tapply(df$value, list( format(df$date, "%b"), as.POSIXlt(df$date)$mday %/% 10), mean)
0 1 2
Oct -0.560686 -0.727245 -0.73779
Sep -0.464818 -0.630628 0.46759
我不确定它与总体方法相比如何:
aggregate(df$value, list( format(df$date, "%b"), as.POSIXlt(df$date)$mday %/% 10), mean)
Group.1 Group.2 x
1 Oct 0 -0.560686
2 Sep 0 -0.464818
3 Oct 1 -0.727245
4 Sep 1 -0.630628
5 Oct 2 -0.737790
6 Sep 2 0.467590