我有一个这样的数据框:
data=data.frame(ID=c("0001","0002","0003","0004","0004","0004","0001","0001","0002","0003"),Saldo=c(10,10,10,15,20,50,100,80,10,10),place=c("grocery","market","market","cars","market","market","cars","grocery","cars","cars"))
我试图计算应用cumsum或者应用的ID变量中每个人的aldo总和,但是我没有得到我想要的结果。我想要这样的人:
ID Saldo.Total
1 0001 190
2 0002 20
3 0003 20
4 0004 85
答案 0 :(得分:5)
您可以使用aggregate
:
> aggregate(Saldo ~ ID, data, function(x) max(cumsum(x))) ## same as sum
ID Saldo
1 0001 190
2 0002 20
3 0003 20
4 0004 85
如果您真的对ID 累积总和感兴趣,请尝试以下操作:
within(data, {
Saldo.Total <- ave(Saldo, ID, FUN = cumsum)
})
# ID Saldo place Saldo.Total
# 1 0001 10 grocery 10
# 2 0002 10 market 10
# 3 0003 10 market 10
# 4 0004 15 cars 15
# 5 0004 20 market 35
# 6 0004 50 market 85
# 7 0001 100 cars 110
# 8 0001 80 grocery 190
# 9 0002 10 cars 20
# 10 0003 10 cars 20
答案 1 :(得分:1)
我想你可能已经感到困惑,因为你想要的并不是累积总和,它只是一个总和:
library(plyr)
ddply(
data,
.(ID),
summarize,
Saldo.Total=sum(Saldo)
)
输出:
ID Saldo.Total
1 0001 190
2 0002 20
3 0003 20
4 0004 85
累积总和是沿着向量移动时的“运行总计”,例如:
> x = c(1, 2, 3, 4, 5)
> cumsum(x)
[1] 1 3 6 10 15