我明白为什么矢量化函数比for循环更好。
但是有些问题我无法看到矢量化函数式编程解决方案。其中一个是汇总月度数据以获得季度数据。任何替换此代码的建议......
month <- 1:100
A422072L <- c(rep(NA, 4), rnorm(96, 100, 5) ) + 2 * month
A422070J <- c(NA, NA, rnorm(96, 100, 5), NA, NA) + 2 * month
Au.approvals <- data.frame(month=month, A422072L=A422072L, A422070J=A422070J)
Au.approvals$trend.sum.A422072L.qtr <- NA
Au.approvals$sa.sum.A422070J.qtr <- NA
for(i in seq_len(nrow(Au.approvals)))
{
if(i < 3) next
if(all(!is.na(Au.approvals$A422072L[(i-2):i])))
Au.approvals$trend.sum.A422072L.qtr[i] <- sum(Au.approvals$A422072L[(i-2):i])
if(all(!is.na(Au.approvals$A422070J[(i-2):i])))
Au.approvals$sa.sum.A422070J.qtr[i] <- sum(Au.approvals$A422070J[(i-2):i])
}
print(Au.approvals)
现在有足够的数据作为例子运行。
答案 0 :(得分:4)
让我们创建一些虚假的时间序列:
time_dat = data.frame(t = 1:100, value = runif(100))
要获得滚动金额,请查看动物园套餐中的rollapply
:
require(zoo)
time_dat = transform(time_dat,
roll_value = rollapply(value, 10, sum, fill = TRUE))
这里我假设较粗糙的分辨率(季度)比精细分辨率粗10倍。
非滚动意义的原始答案:
我想使用plyr
包中的功能,但ave
,aggregate
和data.table
也是不错的选择。对于大型数据集,data.table
速度很快。但要回到一些plyr
魔法:
首先创建一个额外的列,指定更粗略的时间频率,即您在以下位置观察的是哪个季度:
time_dat[["coarse_t"]] = rep(1:10, each = 10)
> head(time_dat)
t value coarse_t
1 1 0.9045097 1
2 2 0.4174182 1
3 3 0.5638139 1
4 4 0.8228698 1
5 5 0.7059027 1
6 6 0.5285386 1
现在我们可以汇总time_dat
以获得更粗略的时间频率:
time_dat_coarse = ddply(time_dat, .(coarse_t), summarise, sum_value = sum(value))
> time_dat_coarse
coarse_t sum_value
1 1 6.097348
2 2 4.834720
3 3 3.988809
4 4 4.170656
5 5 4.538269
6 6 6.198716
7 7 4.399282
8 8 5.507384
9 9 6.089072
10 10 4.663287
答案 1 :(得分:1)
Paul的答案很棒,但我只想补充一点,chron软件包有很多优秀的日期/时间分类操作,可以与plyr配对进行聚合
library("chron")
# chron uses chron-specific object representation.
# If a different representation is needed, a conversion is necessary
# eg. if a$date is a chron date object, I would us as.POSIXct(a$date) to get a POSIXct representation
# create chron date objects and values
a<-data.frame(date=as.chron(Sys.Date() + 1:1000), value = 1:100*runif(100,0,1))
# cuts dates into 15 intervals
a$interval1<-cut(a$date,15)
# cuts dates into 10 number of intervals using a label you define
a$interval2<-cut(a$date,10,paste("group",1:10))
# cuts dates into weeks
a$weeks<-cut(a$date,"weeks",start.on.monday=FALSE)
# cuts dates into months
a$months<-cut(a$date,"months")
# cuts dates into years
a$years<-cut(a$date,"years")
# classifies day based on day of week
a$day_of_week<-day.of.week(a$date)
# creating a chron time object
b<-data.frame(day_time=as.chron(Sys.time()+1:1000*100), value = 1:100*runif(100,0,1))
# cuts times into days - note: uses first time period as the start
b$day<-cut(b$day_time,"days")
# truncates time to 5 minute interval
b$min_5<-trunc(b$day_time, "00:05:00")
# truncates time to 1 hour intervals
b$hour1<-trunc(b$day_time, "01:00:00")
# truncates datetime to 1 hour and 2 second intervals
b$days_3<-trunc(b$day_time, "01:00:02")
我使用chron很多,因为它使时间聚合更容易。
为了获得额外的精彩,动物园和xts包具有更多功能,这些功能非常适合日常细节级别的各种聚合。他们的文档很庞大,可能很难找到你想要的东西,但几乎你想要的东西都在那里。一些亮点:
library("zoo")
library("xts")
?rollapply
?rollsum
?rollmean
?rollmedian
?rollmax
?yearmon
?yearqtr
?apply.daily
?apply.weekly
?apply.monthly
?apply.quarterly
?apply.yearly
?to.minutes
?to.minutes3
?to.minutes5
?to.minutes10
?to.minutes15
?to.minutes30
?to.hourly
?to.daily
?to.weekly
?to.monthly
?to.quarterly
?to.yearly
?to.period