我想按ID计算前一年窗口中当前行之前的行数。
这是我的数据:
df <- structure(list(id = c("1", "1", "1", "1",
"2", "2", "2", "2", "2", "2", "2",
"2", "2"), flag = c(1, 1, 0, 1, 0, 0, 1, 1,
1, 1, 1, 1, 1), date = structure(c(15425, 15456, 16613,
16959, 15513, 15513, 15625, 15635, 15649, 15663, 15670, 16051,
16052), class = "Date")), sorted = "id", class = c("data.table",
"data.frame"), row.names = c(NA, -13L))
roll_sum <- c(0, 1, 0, 1, 0, 1, 2, 3, 4, 5, 6, 0, 1)
flag_sum <- c(0, 1, 0, 0, 0, 0, 0, 1, 2, 3, 4, 0, 1)
df_desired <- cbind(df, roll_sum) # roll_sum: number of rows excluding current row in 1 year time frame rolling
df_desired <- cbind(df_desired, flag_sum) # flag_sum: number of rows excluding current row in 1 year time frame rolling where flag was 1
数据:
id flag date
1: 1 1 2012-03-26
2: 1 1 2012-04-26
3: 1 0 2015-06-27
4: 1 1 2016-06-07
5: 2 0 2012-06-22
6: 2 0 2012-06-22
7: 2 1 2012-10-12
8: 2 1 2012-10-22
9: 2 1 2012-11-05
10: 2 1 2012-11-19
11: 2 1 2012-11-26
12: 2 1 2013-12-12
13: 2 1 2013-12-13
输出:
df_desired
id flag date roll_sum flag_sum
1: 1 1 2012-03-26 0 0
2: 1 1 2012-04-26 1 1
3: 1 0 2015-06-27 0 0
4: 1 1 2016-06-07 1 0
5: 2 0 2012-06-22 0 0
6: 2 0 2012-06-22 1 0
7: 2 1 2012-10-12 2 0
8: 2 1 2012-10-22 3 1
9: 2 1 2012-11-05 4 2
10: 2 1 2012-11-19 5 3
11: 2 1 2012-11-26 6 4
12: 2 1 2013-12-12 0 0
13: 2 1 2013-12-13 1 1
我在Compute rolling sum by id variables, with missing timepoints中使用zoo
尝试了G. Grothendieck给出的解决方案,但这给了我一个错误:
merge.zoo(z,g)中的错误: 系列不能与系列中的非唯一索引条目合并 另外:警告消息: 在动物园(计数,日期):
我使用make.index.unique
和make.time.unique
使日期列变得唯一。
感谢您提供有关优化解决方案的帮助。谢谢。
答案 0 :(得分:1)
不确定这对您数据的维度是否有帮助。
首先,创建运行索引以处理重复的日期和总和不得包含上次的重复日期,并且还必须在一年前创建日期(我认为365更好,但似乎OP希望366)。
然后,执行非等价自联接,同时确保未使用上次伪造日期且日期在一年之内。
VERSION ?= $(shell . $(HELPER); getVersion)
$(if $(VERSION),,$(error getVersion failed))
结果:
df[, c("rn", "oneYrAgo") := .(.I, date - 366)]
df[df,
.(roll_sum=.N, flag_sum=sum(flag, na.rm=TRUE)),
on=.(date >= oneYrAgo, rn < rn, id, date <= date),
by=.EACHI][,
-seq_len(2L)]