我有如下数据框:
ID Group SubmitDate BookDate Amount Total
1 A 2011-01-01 2011-01-01 100 0
2 A 2011-10-01 2011-05-01 0 100
3 B 2012-01-01 2012-02-20 500 0
4 B 2012-02-01 2012-04-01 300 0
5 B 2012-03-01 2012-03-15 400 500
6 B 2012-03-16 2012-03-18 900 900
如果当前行的SubmitDate在之前行的BookDate之后,我希望每个ID的总计等于同一组中先前行的金额总和。
即对于ID 5,ID 5的SubmitDate仅在ID 3的BookDate之后(我们只查看同一组中的ID),然后总计= 500
对于ID 6,ID 6的SubmitDate仅在ID 3和5的BookDate之后(我们只查看同一组中的ID),然后总计= 500 + 400 = 900
重新生成此示例
data <- data.frame(ID = c(1,2,3,4,5,6),
Group = c("A","A","B","B","B","B"),
SubmitDate = as.Date(c("2011-01-01","2011-10-01","2012-01-01","2012-02-01","2012-03-01","2012-03-16")),
BookDate = as.Date(c("2011-01-01","2011-05-01","2012-02-20","2012-04-01","2012-03-15","2012-03-18")),
Amount = as.numeric(c("100","0","500","300","400","900")))
我在想这个,但它只比较当前行的submital和bookDate。
data %>% group_by(Group) %>% mutate(Total= cumsum( SubmittalDate <BookedDate ) )
答案 0 :(得分:0)
假设有一个拼写错误,并且ID == 4的总数为0,而ID == 2的总数为100,并且该OP不介意使用data.table
包解决方案。这是一种方法:
library(data.table)
setDT(data)
data[, Total :=
data[data, sum(x.Amount[-.N][x.BookDate[-.N] < i.SubmitDate]), by=.EACHI, on=.(Group)]$V1
]
# ID Group SubmitDate BookDate Amount GrpID Total
#1: 1 A 2011-01-01 2011-01-01 100 1 0
#2: 2 A 2011-10-01 2011-05-01 0 2 100
#3: 3 B 2012-01-01 2012-02-20 500 1 0
#4: 4 B 2012-02-01 2012-04-01 300 2 0
#5: 5 B 2012-03-01 2012-03-15 400 3 500
#6: 6 B 2012-03-16 2012-03-18 900 4 900
说明:
1)data[data, j=..., by=.EACHI, on=.(Group)]
使用Group
进行自我加入,并为j
中的每一行执行i
。请参阅?data.table
以了解i
,j
,by
,.EACHI
和on
的含义。 .
是list
的别名。
3)在x[i, ...]
联接中(即x与i联接),x.*
引用x
左侧[
表中的列。 x.BookDate[-.N]
将BookDate列减去最后一列,即当前行。
4)x.Amount[-.N][x.BookDate[-.N] < i.SubmitDate]
子集并将满足OP要求的金额相加。