R data.table找到当前行到上一行之间的滞后(速度基准)

时间:2018-03-12 05:25:13

标签: r performance data.table row lag

此问题是此post

的补充
> tempDT <- data.table(colA = c("E","E","A","A","E","A","E","A","E","A")
+                      , lags = c(NA,1,1,2,3,1,2,NA,NA,1)
+                      , group = c(1,1,1,1,1,1,1,2,2,2))
> tempDT
    colA lags group
 1:    E   NA     1
 2:    E    1     1
 3:    A    1     1
 4:    A    2     1
 5:    E    3     1
 6:    A    1     1
 7:    E    2     1
 8:    A   NA     2
 9:    E   NA     2
10:    A    1     2

我有列colA,需要找到当前行与前一行colA == "E"之间的滞后。

@Frank提出了两种方法:

 w = tempDT[colA == "E", which=TRUE]; tempDT[, v := shift(rowid(findInterval(.I, w))), by = "group"]

tempDT[, v:= shift(rowid(cumsum(colA=="E"))), by = "group"]

由于我拥有超过7200万条记录,因此想知道是否有任何其他方式可以更快地计算出来。

0 个答案:

没有答案