我有一个xts
对象,其中包含多个股票代码的时间序列。我需要在符号特定的子组中拆分xts
对象并处理每个符号的数据,然后重新组合包含完整行集的原始xts
矩阵中的所有子组。每个符号都是1到4个字符之间的字段,它用作分割子组中矩阵的因子索引。
这些是在调用by()
,lapply()
和ddply()
时报告分割矩阵的时间:
> dim(ets)
[1] 442750 24
> head(ets)
Symbol DaySec ExchTm LclTm Open High Low Close CloseRet
2011-07-22 09:35:00 "AA" "34500" "09:34:54.697.094" "09:34:54.697.052" " 158100" " 158400" " 157900" " 158200" " 6.325111e-04"
2011-07-22 09:35:00 "AAPL" "34500" "09:34:59.681.827" "09:34:59.681.797" "3899200" "3899200" "3892200" "3894400" "-1.231022e-03"
2011-07-22 09:35:00 "ABC" "34500" "09:34:49.805.994" "09:34:49.806.008" " 400100" " 401800" " 400100" " 401600" " 3.749063e-03"
2011-07-22 09:35:00 "ALL" "34500" "09:34:59.009.001" "09:34:59.008.810" " 285500" " 285500" " 285300" " 285300" "-7.005254e-04"
2011-07-22 09:35:00 "AMAT" "34500" "09:34:59.982.447" "09:34:59.982.423" " 130200" " 130500" " 130200" " 130500" " 2.304147e-03"
2011-07-22 09:35:00 "AMZN" "34500" "09:34:48.012.576" "09:34:48.012.565" "2137400" "2139100" "2137400" "2139100" " 7.953588e-04"
... (15 more columns)
> system.time(by(ets, ets$Symbol, function(x) { return(x) }))
user system elapsed
78.725 0.932 79.735
> system.time(ddply(as.data.frame(ets), "Symbol", function(x) { return (x) }))
user system elapsed
100.590 0.416 101.105
> system.time(lapply(split.default(ets, ets$Symbol), function(x) { return(x) }))
user system elapsed
1.572 0.280 1.853
有关使用数据框和矩阵子组的更多信息,请参阅this优秀博文。
为什么使用lapply / split.default时性能会有如此大的差异?
答案 0 :(得分:0)
在数字模式下工作会大大缩短处理时间:
> system.time(by(myxts[,c(1,2,3,4,5)], myxts$Symbol, summary))
user system elapsed
57.768 0.688 58.511
> system.time(by(myxts[,c(1,2,3,4,5,6,7,8)], myxts$Symbol, summary))
user system elapsed
62.284 0.620 62.971
> system.time(by(myxts[,c(1,2,3,4,5,6,7,8, 9, 10, 11, 12)], myxts$Symbol, summary))
user system elapsed
76.529 0.632 77.232
> myxts.numeric = myxts
> mode(myxts.numeric) = "numeric"
Warning message:
In as.double.xts(c("AA", "AAPL", "ABC", "ALL", "AMAT", "AMZN", "BAC", :
NAs introduced by coercion
> system.time(by(myxts.numeric[,c(1,2,3,4,5,6,7,8, 9, 10, 11, 12)], myxts$Symbol, summary))
user system elapsed
4.948 0.688 5.642