已经搜索了一段时间,但还没找到我想要的东西。为了便于说明,假设有一个数据集如下:
library(data.table)
set.seed(666)
foo = data.table(id = 1:20, value = sample(c(1, -1), 20, replace = T))
id value
1: 1 -1
2: 2 1
3: 3 -1
4: 4 1
5: 5 1
6: 6 -1
7: 7 -1
8: 8 1
9: 9 1
10: 10 1
11: 11 -1
12: 12 1
13: 13 1
14: 14 1
15: 15 1
16: 16 -1
17: 17 1
18: 18 -1
19: 19 1
20: 20 1
我希望每次value
更改时都会创建唯一的组ID,从而产生
id value grp
1: 1 -1 1
2: 2 1 2
3: 3 -1 3
4: 4 1 4
5: 5 1 4
6: 6 -1 5
7: 7 -1 5
8: 8 1 6
9: 9 1 6
10: 10 1 6
11: 11 -1 7
12: 12 1 8
13: 13 1 8
14: 14 1 8
15: 15 1 8
16: 16 -1 9
17: 17 1 10
18: 18 -1 11
19: 19 1 12
20: 20 1 12
我可以在循环中完成
foo[, cc := value == shift(value)][is.na(cc), cc := FALSE]
for(i in 1:nrow(foo)){
if(foo[i]$cc != T){
pp = i
foo[i, grp := pp]} else {
foo[i, grp := pp]}
}
foo[, grp := as.numeric(as.factor(grp))]
有更聪明的方法吗?
答案 0 :(得分:1)
我们可以使用rleid
foo[, grp := rleid(value)]
foo
# id value grp
# 1: 1 -1 1
# 2: 2 1 2
# 3: 3 -1 3
# 4: 4 1 4
# 5: 5 1 4
# 6: 6 -1 5
# 7: 7 -1 5
# 8: 8 1 6
# 9: 9 1 6
#10: 10 1 6
#11: 11 -1 7
#12: 12 1 8
#13: 13 1 8
#14: 14 1 8
#15: 15 1 8
#16: 16 -1 9
#17: 17 1 10
#18: 18 -1 11
#19: 19 1 12
#20: 20 1 12