library(data.table)
counting <- structure(
list(
unique = c(1000,1001,1002,1003,1004,1005,1006,1007,1008,1000,1001,1002,1003,1004),
increment = c(0,0,0,1,0,0,0,1,1,0,1,0,1,0)
),
.Names = c("unique", "increment"),
class = "data.frame",
row.names = c(0L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L))
setDT(counting)
class(counting)
counting
设置
unique increment
1: 1000 0
2: 1001 0
3: 1002 0
4: 1003 1
5: 1004 0
6: 1005 0
7: 1006 0
8: 1007 1
9: 1008 1
10: 1000 0
11: 1001 1
12: 1002 0
13: 1003 1
14: 1004 0
我一直鼓励我的大脑留下excel'if else'陈述。
如何最好地矢量化创建一个以(例如)100开始的新列的过程,并且仅基于'increment'列增加,并且每次'unique'=重置为100 = 1000?
欲望输出将是 unique increment runningTally
1: 1000 0 100
2: 1001 0 100
3: 1002 0 100
4: 1003 1 101
5: 1004 0 101
6: 1005 0 101
7: 1006 0 101
8: 1007 1 102
9: 1008 1 103
10: 1000 0 100
11: 1001 1 101
12: 1002 0 101
13: 1003 1 102
14: 1004 0 102
感谢您的见解。我相信我应该远离循环,因为这将是数百万行。
答案 0 :(得分:4)
尝试
counting[, runningTally:=cumsum(increment)+100, by=cumsum(unique==1000)]
对于更一般的情况,也许以下帮助
counting[,runningTally:=cumsum(c(0,increment[-1]))+100,
by=cumsum(unique==1000)]
答案 1 :(得分:3)
在dplyr中 - 类似于data.table中的akruns方法 - 你可以这样做:
library(dplyr)
counting %>% group_by(grp = cumsum(unique == 1000)) %>%
mutate(n = cumsum(increment) + 100) %>%
ungroup() %>% select(-grp) # to remove the grouping column again
Source: local data frame [14 x 3]
unique increment n
1 1000 0 100
2 1001 0 100
3 1002 0 100
4 1003 1 101
5 1004 0 101
6 1005 0 101
7 1006 0 101
8 1007 1 102
9 1008 1 103
10 1000 0 100
11 1001 1 101
12 1002 0 101
13 1003 1 102
14 1004 0 102