Question

library(data.table)
counting <- structure(
  list(
    unique = c(1000,1001,1002,1003,1004,1005,1006,1007,1008,1000,1001,1002,1003,1004), 
    increment = c(0,0,0,1,0,0,0,1,1,0,1,0,1,0)
  ), 
  .Names = c("unique", "increment"), 
  class = "data.frame", 
  row.names = c(0L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L))
setDT(counting)
class(counting)
counting

设置

    unique increment
 1:   1000         0
 2:   1001         0
 3:   1002         0
 4:   1003         1
 5:   1004         0
 6:   1005         0
 7:   1006         0
 8:   1007         1
 9:   1008         1
10:   1000         0
11:   1001         1
12:   1002         0
13:   1003         1
14:   1004         0

我一直鼓励我的大脑留下excel'if else'陈述。

如何最好地矢量化创建一个以（例如）100开始的新列的过程，并且仅基于'increment'列增加，并且每次'unique'=重置为100 = 1000？

欲望输出将是

    unique increment runningTally
 1:   1000         0          100       
 2:   1001         0          100
 3:   1002         0          100
 4:   1003         1          101
 5:   1004         0          101
 6:   1005         0          101
 7:   1006         0          101
 8:   1007         1          102
 9:   1008         1          103
10:   1000         0          100
11:   1001         1          101
12:   1002         0          101
13:   1003         1          102
14:   1004         0          102

感谢您的见解。我相信我应该远离循环，因为这将是数百万行。

Answer 1

尝试

counting[, runningTally:=cumsum(increment)+100, by=cumsum(unique==1000)]

更新

对于更一般的情况，也许以下帮助

counting[,runningTally:=cumsum(c(0,increment[-1]))+100,
                                     by=cumsum(unique==1000)]

Answer 2

在dplyr中 - 类似于data.table中的akruns方法 - 你可以这样做：

library(dplyr)
counting %>% group_by(grp = cumsum(unique == 1000)) %>%
  mutate(n = cumsum(increment) + 100) %>%
  ungroup() %>% select(-grp)  # to remove the grouping column again

Source: local data frame [14 x 3]

   unique increment   n
1    1000         0 100
2    1001         0 100
3    1002         0 100
4    1003         1 101
5    1004         0 101
6    1005         0 101
7    1006         0 101
8    1007         1 102
9    1008         1 103
10   1000         0 100
11   1001         1 101
12   1002         0 101
13   1003         1 102
14   1004         0 102

R新列由前一行值和另一列下一行值构成

2 个答案:

更新