我正在尝试计算在同一天发生的值的总和(XYZmin
)。
我的数据看起来像这样,
bar <- structure(list(date = structure(c(15622, 15622, 15622, 15628,
15632, 15635, 15639, 15639, 15639, 15639, 15639, 15642, 15646,
15646, 15650, 15650, 15650, 15657, 15660, 15660, 15674, 15681,
15691, 15695, 15709, 15716, 15723, 15730, 15737, 15737, 15737,
15737, 15737, 15737, 15740, 15743, 15743, 15743, 15744, 15744,
15744, 15744, 15746, 15751, 15755, 15758), class = "Date"), XYZmin = c(-20,
-15, -10, -70, -60, -60, -95, -10, -10, -40, -25, -25, -20, -10,
-3, -5, -25, -5, -70, -5, -30, -30, -25, 60, 60, 60, 60, 60,
-10, -10, -30, -30, -10, -10, -10, -60, -30, -10, 75, -10, -10,
-10, 60, 60, -15, 60)), .Names = c("date", "XYZmin"), class = "data.frame", row.names = c(NA,
-46L))
head(bar)
date XYZmin
1 2012-10-09 -20
2 2012-10-09 -15
3 2012-10-09 -10
4 2012-10-15 -70
5 2012-10-19 -60
6 2012-10-22 -60
我正在努力完成的是创建一个新的变量XYZtot
,其中,在多次出现的数据中,将第一个和第二个值加在第二个数据上,并将第一个和第二个相加,和第3个数据的第3个值。这是我的目标。
head(new_bar_with_XYZtot)
date XYZmin XYZtot
1 2012-10-09 -20 -20
2 2012-10-09 -15 -35
3 2012-10-09 -10 -40
4 2012-10-15 -70 -70
5 2012-10-19 -60 -60
6 2012-10-22 -60 -60
microbenchmark
测试alexwhan <- function(bar,date,XYZmin) ddply(bar, .(date), transform, XYZmin.sum = cumsum(XYZmin))
Arun <- function(bar,date,XYZmin) within(bar, {XYZtot <- ave( XYZmin, date, FUN=cumsum)})
agstudy <- function(bar,date,XYZmin) transform(bar, XYZtot = ave(XYZmin, date, FUN = cumsum))
# install.packages("data.table", dependencies = TRUE)
library(data.table)
mnel <- function(bar,date,XYZmin) bar <- data.table(bar); bar[, XYZmin.sum := cumsum(XYZmin), by = date]
# install.packages("microbenchmark", dependencies = TRUE)
require(microbenchmark)
# run test
res <- microbenchmark(alexwhan(bar,date,XYZmin), Arun(bar,date,XYZmin), agstudy(bar,date,XYZmin), mnel(bar,date,XYZmin), times = 666)
## Print results:
print(res)
数字,
Unit: microseconds
expr min lq median uq max neval
alexwhan(bar, date, XYZmin) 14484.077 15056.613 15237.760 15945.482 72650.126 666
Arun(bar, date, XYZmin) 963.632 1018.311 1070.759 1138.655 4988.226 666
agstudy(bar, date, XYZmin) 1967.292 2021.115 2078.261 2158.689 9240.500 666
mnel(bar, date, XYZmin) 251.312 270.295 282.821 325.040 6540.367 666
### Plot results:
boxplot(res)
答案 0 :(得分:5)
如果你想要速度,我会提出一个data.table
解决方案
library(data.table)
bar <- data.table(bar)
# assigning within bar
bar[, XYZmin.sum := cumsum(XYZmin), by = date]
这将扩展为大数据!
答案 1 :(得分:3)
这是使用ave
:
bar <- within(bar, {XYZtot <- ave( XYZmin, date, FUN=cumsum)})
答案 2 :(得分:3)
使用ave
但transform
transform(bar, XYZtot = ave(XYZmin, date, FUN = cumsum))
OP评论后编辑
transform(bar, XYZtot = ave(XYZmin, date, FUN =
function(x)
if(length(x) < 1) NA
else c(cumsum(x[-length(x)]),NA)))
答案 3 :(得分:1)
这就是你要追求的吗?
bar.sum <- ddply(bar, .(date), transform,
XYZmin.sum = cumsum(XYZmin))
bar.sum