此问题是此post
的补充> tempDT <- data.table(colA = c("E","E","A","A","E","A","E","A","E","A")
+ , lags = c(NA,1,1,2,3,1,2,NA,NA,1)
+ , group = c(1,1,1,1,1,1,1,2,2,2))
> tempDT
colA lags group
1: E NA 1
2: E 1 1
3: A 1 1
4: A 2 1
5: E 3 1
6: A 1 1
7: E 2 1
8: A NA 2
9: E NA 2
10: A 1 2
我有列colA
,需要找到当前行与前一行colA == "E"
之间的滞后。
@Frank提出了两种方法:
w = tempDT[colA == "E", which=TRUE]; tempDT[, v := shift(rowid(findInterval(.I, w))), by = "group"]
tempDT[, v:= shift(rowid(cumsum(colA=="E"))), by = "group"]
由于我拥有超过7200万条记录,因此想知道是否有任何其他方式可以更快地计算出来。