如何计算当前行的cumsum,条件是另一列中其他行的值

时间:2018-05-23 21:34:02

标签: r dplyr

我有如下数据框:

ID Group SubmitDate   BookDate      Amount Total
1     A  2011-01-01   2011-01-01    100    0
2     A  2011-10-01   2011-05-01      0    100
3     B  2012-01-01   2012-02-20    500    0
4     B  2012-02-01   2012-04-01    300    0
5     B  2012-03-01   2012-03-15    400    500
6     B  2012-03-16   2012-03-18    900    900 

如果当前行的SubmitDate在之前行的BookDate之后,我希望每个ID的总计等于同一组中先前行的金额总和。

即对于ID 5,ID 5的SubmitDate仅在ID 3的BookDate之后(我们只查看同一组中的ID),然后总计= 500

对于ID 6,ID 6的SubmitDate仅在ID 3和5的BookDate之后(我们只查看同一组中的ID),然后总计= 500 + 400 = 900

重新生成此示例

data <- data.frame(ID = c(1,2,3,4,5,6),
               Group = c("A","A","B","B","B","B"),
               SubmitDate = as.Date(c("2011-01-01","2011-10-01","2012-01-01","2012-02-01","2012-03-01","2012-03-16")),
               BookDate = as.Date(c("2011-01-01","2011-05-01","2012-02-20","2012-04-01","2012-03-15","2012-03-18")),
               Amount = as.numeric(c("100","0","500","300","400","900")))

我在想这个,但它只比较当前行的submital和bookDate。

data %>% group_by(Group) %>% mutate(Total= cumsum( SubmittalDate <BookedDate ) ) 

1 个答案:

答案 0 :(得分:0)

假设有一个拼写错误,并且ID == 4的总数为0,而ID == 2的总数为100,并且该OP不介意使用data.table包解决方案。这是一种方法:

library(data.table)
setDT(data)
data[, Total :=
    data[data, sum(x.Amount[-.N][x.BookDate[-.N] < i.SubmitDate]), by=.EACHI, on=.(Group)]$V1
]

#   ID Group SubmitDate   BookDate Amount GrpID Total
#1:  1     A 2011-01-01 2011-01-01    100     1     0
#2:  2     A 2011-10-01 2011-05-01      0     2   100
#3:  3     B 2012-01-01 2012-02-20    500     1     0
#4:  4     B 2012-02-01 2012-04-01    300     2     0
#5:  5     B 2012-03-01 2012-03-15    400     3   500
#6:  6     B 2012-03-16 2012-03-18    900     4   900

说明:

1)data[data, j=..., by=.EACHI, on=.(Group)]使用Group进行自我加入,并为j中的每一行执行i。请参阅?data.table以了解ijby.EACHIon的含义。 .list的别名。

3)在x[i, ...]联接中(即x与i联接),x.*引用x左侧[表中的列。 x.BookDate[-.N]将BookDate列减去最后一列,即当前行。

4)x.Amount[-.N][x.BookDate[-.N] < i.SubmitDate]子集并将满足OP要求的金额相加。