实施相邻事件data.table之间的最小间隔

时间:2018-05-31 05:16:59

标签: r data.table

我有一个数据集,其中包含参与者的评估和日期。每个活动结束后,参与者不得在一段时间内接受评估,因此我需要在此期间忽略其他评估。

目前我已经通过一些hacky循环实现了这一点,但是有循环和比较,而我的实际数据集有数十万行。我想知道是否有人能想出一个更“纯粹”的data.table解决方案。

library(data.table)
exclusionPeriod = 5

dt = data.table(id = c('a','a','a','a','b','b','b','c','c','c','c'),
                start = c(1, 2, 7, 9, 1, 8, 12, 2, 4, 5, 8))

modelOut = data.table(id = c('a','a','b','b','c','c'),
                      start = c(1, 7, 1, 8, 2, 8))

print(dt)
print(modelOut)

dt[,diff := c(0, diff(start)), by = id]    
dt[,csum := cumsum(diff), by = id]

maxIts = 100
its = 0
while (nrow(dt[csum>exclusionPeriod]) & its < maxIts) {
  dt[csum>exclusionPeriod, diff := c(0, diff(start)), by = id]   
  dt[csum>exclusionPeriod, csum := cumsum(diff), by = id]
  its = its+1
}

out = dt[csum==0,.(id,start)]
print(out)
print(all.equal(modelOut,out))

0 个答案:

没有答案