我有一个数据集,其中包含参与者的评估和日期。每个活动结束后,参与者不得在一段时间内接受评估,因此我需要在此期间忽略其他评估。
目前我已经通过一些hacky循环实现了这一点,但是有循环和比较,而我的实际数据集有数十万行。我想知道是否有人能想出一个更“纯粹”的data.table解决方案。
library(data.table)
exclusionPeriod = 5
dt = data.table(id = c('a','a','a','a','b','b','b','c','c','c','c'),
start = c(1, 2, 7, 9, 1, 8, 12, 2, 4, 5, 8))
modelOut = data.table(id = c('a','a','b','b','c','c'),
start = c(1, 7, 1, 8, 2, 8))
print(dt)
print(modelOut)
dt[,diff := c(0, diff(start)), by = id]
dt[,csum := cumsum(diff), by = id]
maxIts = 100
its = 0
while (nrow(dt[csum>exclusionPeriod]) & its < maxIts) {
dt[csum>exclusionPeriod, diff := c(0, diff(start)), by = id]
dt[csum>exclusionPeriod, csum := cumsum(diff), by = id]
its = its+1
}
out = dt[csum==0,.(id,start)]
print(out)
print(all.equal(modelOut,out))