(日期)内的总和

时间:2013-03-12 20:33:32

标签: r sum base

我正在尝试计算在同一天发生的值的总和(XYZmin)。

我的数据看起来像这样,

bar <- structure(list(date = structure(c(15622, 15622, 15622, 15628, 
15632, 15635, 15639, 15639, 15639, 15639, 15639, 15642, 15646, 
15646, 15650, 15650, 15650, 15657, 15660, 15660, 15674, 15681, 
15691, 15695, 15709, 15716, 15723, 15730, 15737, 15737, 15737, 
15737, 15737, 15737, 15740, 15743, 15743, 15743, 15744, 15744, 
15744, 15744, 15746, 15751, 15755, 15758), class = "Date"), XYZmin = c(-20, 
-15, -10, -70, -60, -60, -95, -10, -10, -40, -25, -25, -20, -10, 
-3, -5, -25, -5, -70, -5, -30, -30, -25, 60, 60, 60, 60, 60, 
-10, -10, -30, -30, -10, -10, -10, -60, -30, -10, 75, -10, -10, 
-10, 60, 60, -15, 60)), .Names = c("date", "XYZmin"), class = "data.frame", row.names = c(NA, 
-46L))

head(bar)   
        date XYZmin
1 2012-10-09    -20
2 2012-10-09    -15
3 2012-10-09    -10
4 2012-10-15    -70
5 2012-10-19    -60
6 2012-10-22    -60

我正在努力完成的是创建一个新的变量XYZtot,其中,在多次出现的数据中,将第一个和第二个值加在第二个数据上,并将第一个和第二个相加,和第3个数据的第3个值。这是我的目标。

head(new_bar_with_XYZtot) 

        date XYZmin XYZtot
1 2012-10-09    -20    -20
2 2012-10-09    -15    -35
3 2012-10-09    -10    -40
4 2012-10-15    -70    -70
5 2012-10-19    -60    -60
6 2012-10-22    -60    -60

使用 big microbenchmark测试

进行了更新
alexwhan <- function(bar,date,XYZmin) ddply(bar, .(date), transform, XYZmin.sum = cumsum(XYZmin))

Arun <- function(bar,date,XYZmin) within(bar, {XYZtot <- ave( XYZmin, date, FUN=cumsum)})

agstudy <- function(bar,date,XYZmin) transform(bar, XYZtot = ave(XYZmin, date, FUN = cumsum))

# install.packages("data.table", dependencies = TRUE)
library(data.table)
mnel <- function(bar,date,XYZmin)  bar <- data.table(bar); bar[, XYZmin.sum := cumsum(XYZmin), by = date]

# install.packages("microbenchmark", dependencies = TRUE)
require(microbenchmark)

# run test
res <- microbenchmark(alexwhan(bar,date,XYZmin), Arun(bar,date,XYZmin), agstudy(bar,date,XYZmin), mnel(bar,date,XYZmin), times = 666)


## Print results:
print(res)

数字,

Unit: microseconds
                        expr       min        lq    median        uq       max neval
 alexwhan(bar, date, XYZmin) 14484.077 15056.613 15237.760 15945.482 72650.126   666
     Arun(bar, date, XYZmin)   963.632  1018.311  1070.759  1138.655  4988.226   666
  agstudy(bar, date, XYZmin)  1967.292  2021.115  2078.261  2158.689  9240.500   666
     mnel(bar, date, XYZmin)   251.312   270.295   282.821   325.040  6540.367   666


### Plot results:
boxplot(res)

Plot results

4 个答案:

答案 0 :(得分:5)

如果你想要速度,我会提出一个data.table解决方案

library(data.table)
bar <- data.table(bar)

# assigning within bar
bar[, XYZmin.sum := cumsum(XYZmin), by = date]

这将扩展为大数据!

答案 1 :(得分:3)

这是使用ave

的人
bar <- within(bar, {XYZtot <- ave( XYZmin, date, FUN=cumsum)})

答案 2 :(得分:3)

使用avetransform

transform(bar, XYZtot = ave(XYZmin, date, FUN = cumsum))
OP评论后

编辑

transform(bar, XYZtot = ave(XYZmin, date, FUN = 
                          function(x)
                            if(length(x) < 1) NA 
                            else c(cumsum(x[-length(x)]),NA)))

答案 3 :(得分:1)

这就是你要追求的吗?

bar.sum <- ddply(bar, .(date), transform,
                 XYZmin.sum = cumsum(XYZmin))
bar.sum