我有一张表:
ppp<-data.frame(client=c(1,1,1,3,3),
calldate=c('2014-08-07', '2014-08-09','2014-08-06','2014-08-07', '2014-08-08'),
cant=c(1,2,3,2,1))
我需要计算每个客户的累计天数。 在这种情况下,我需要获得下表:
client calldate cant cum cant
1 06/08/2014 3 3
1 07/08/2014 1 4
1 09/08/2014 2 6
2 07/08/2014 2 2
2 08/08/2014 1 3
我尝试了这个,我得到了严格的解决方案:
ppp <- ppp[order(ppp$client,ppp$calldate),]
ppp$cumsum<-unlist(tapply(ppp$cant,ppp$client,FUN=cumsum))
但这是最好的方法吗?为每个客户创建一个列表然后取消列出该列表? 另外,因为我没有指定日期字段,所以我只是订购数据。
答案 0 :(得分:5)
包dplyr
将非常轻松地为您完成此任务:
library(dplyr)
ppp %>% group_by(client) %>% arrange(calldate) %>% mutate(cumcant=cumsum(cant))
#client calldate cant cumcant
#1 1 2014-08-06 3 3
#2 1 2014-08-07 1 4
#3 1 2014-08-09 2 6
#4 3 2014-08-07 2 2
#5 3 2014-08-08 1 3
答案 1 :(得分:5)
或data.table
选项
library(data.table) # 1.9.4+
setorder(setDT(ppp), client, calldate)[, cum_cant := cumsum(cant), by = client]
ppp
# client calldate cant cum_cant
# 1: 1 2014-08-06 3 3
# 2: 1 2014-08-07 1 4
# 3: 1 2014-08-09 2 6
# 4: 3 2014-08-07 2 2
# 5: 3 2014-08-08 1 3
修改:对于较旧的data.table
版本(&lt; 1.9.4),请使用setkey
代替setorder
setkey(setDT(ppp), client, calldate)[, cum_cant := cumsum(cant), by = client]
编辑#2(根据OP评论):
setkey(setDT(ppp), client, calldate)[, `:=`(cum_cant = cumsum(cant),
cummin_cant = cummin(cant)), by = client]
答案 2 :(得分:3)
这是使用ave
ppp$cumcant <- with(ppp, {
ave(cant[order(client, calldate)], client, FUN = "cumsum")
})
ppp
# client calldate cant cumcant
# 3 1 2014-08-06 3 3
# 1 1 2014-08-07 1 4
# 2 1 2014-08-09 2 6
# 4 3 2014-08-07 2 2
# 5 3 2014-08-08 1 3