每次变量更改其值时生成新的组ID

时间:2018-05-08 13:39:42

标签: r grouping

已经搜索了一段时间,但还没找到我想要的东西。为了便于说明,假设有一个数据集如下:

library(data.table)
set.seed(666)
foo = data.table(id = 1:20, value = sample(c(1, -1), 20, replace = T))
    id value
 1:  1    -1
 2:  2     1
 3:  3    -1
 4:  4     1
 5:  5     1
 6:  6    -1
 7:  7    -1
 8:  8     1
 9:  9     1
10: 10     1
11: 11    -1
12: 12     1
13: 13     1
14: 14     1
15: 15     1
16: 16    -1
17: 17     1
18: 18    -1
19: 19     1
20: 20     1

我希望每次value更改时都会创建唯一的组ID,从而产生

    id value grp
 1:  1    -1   1
 2:  2     1   2
 3:  3    -1   3
 4:  4     1   4
 5:  5     1   4
 6:  6    -1   5
 7:  7    -1   5
 8:  8     1   6
 9:  9     1   6
10: 10     1   6
11: 11    -1   7
12: 12     1   8
13: 13     1   8
14: 14     1   8
15: 15     1   8
16: 16    -1   9
17: 17     1  10
18: 18    -1  11
19: 19     1  12
20: 20     1  12

我可以在循环中完成

foo[, cc := value == shift(value)][is.na(cc), cc := FALSE]

for(i in 1:nrow(foo)){
  if(foo[i]$cc != T){ 
    pp = i
    foo[i, grp := pp]} else {
      foo[i, grp := pp]}
}

foo[, grp := as.numeric(as.factor(grp))]

有更聪明的方法吗?

1 个答案:

答案 0 :(得分:1)

我们可以使用rleid

foo[, grp := rleid(value)]
foo
#    id value grp
# 1:  1    -1   1
# 2:  2     1   2
# 3:  3    -1   3
# 4:  4     1   4
# 5:  5     1   4
# 6:  6    -1   5
# 7:  7    -1   5
# 8:  8     1   6
# 9:  9     1   6
#10: 10     1   6
#11: 11    -1   7
#12: 12     1   8
#13: 13     1   8
#14: 14     1   8
#15: 15     1   8
#16: 16    -1   9
#17: 17     1  10
#18: 18    -1  11
#19: 19     1  12
#20: 20     1  12