使用data.table :: rleid创建序列,并根据其他条件排除一些行

时间:2018-09-04 07:20:23

标签: r dplyr data.table

我想创建一系列后续的相同值,例如data.table::rleid。但是问题是我希望从序列中排除某些行,而应该排除的行可以由另一列定义。我发现data.table::rleid可以使用两次,但是仍然不能达到理想的效果-见下文:

my_example <- structure(list(event = c(234, 234, 224, 232, 232, 201, 201, 201, 
201, 201, 201, 201, 244, 244, 201, 201, 201, 244, 244, 212, 201, 
201, 201, 249, 201, 201, 201, 201, 201, 201, 201, 249, 201, 201, 
244, 244, 201, 261, 245, 203, 204, 204, 201, 201, 201, 201, 201, 
201, 216, 201), subgroup = c(10L, 11L, 10L, 10L, 11L, 10L, 10L, 
10L, 10L, 10L, 10L, 11L, 11L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 
10L, 11L, 11L, 11L, 11L, 11L, 11L, 10L, 11L, 11L, 11L, 10L, 10L,  
10L, 10L, 11L, 11L, 10L, 11L, 10L, 10L, 11L, 10L, 10L, 10L, 10L, 
10L, 10L, 10L, 11L)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -50L), .Names = c("event", "subgroup"))

my_example %>% 
  mutate(in_seq = ! event %in% c(224, 232, 234, 261), 
         seq = data.table::rleid(subgroup) * in_seq,
         seq2 = data.table::rleid(seq))

# A tibble: 50 x 5
    event subgroup in_seq   seq  seq2
    <dbl>    <int> <lgl>  <int> <int>
 1   234       10 F          0     1
 2   234       11 F          0     1
 3   224       10 F          0     1
 4   232       10 F          0     1
 5   232       11 F          0     1
 6   201       10 T          5     2
 7   201       10 T          5     2
 8   201       10 T          5     2
 9   201       10 T          5     2
10   201       10 T          5     2
# ... with 40 more rows

如何从计数中排除一些行? (在上面的示例中,这意味着第1行:5行以及第38行在seq2中都具有NA)

1 个答案:

答案 0 :(得分:1)

如果我们想将's2'中的值更改为NA

library(data.table)
my_example %>% 
  mutate(in_seq = ! event %in% c(224, 232, 234, 261), 
         s1 = rleid(subgroup * in_seq), 
         s2 = rleid(s1) * NA ^ !in_seq)

或者如果's2'需要从'1'开始,则跳过'in_seq'中的FALSE

my_example %>% 
   mutate(in_seq = ! event %in% c(224, 232, 234, 261), 
      s1 = data.table::rleid(subgroup) * in_seq, 
      s2 = (NA^!s1) * s1,
      s2 = match(s2, unique(na.omit(s2))))

或者可能是

setDT(my_example)[, in_seq := !event %in% c(224, 232, 234, 261)
      ][, s1 := rleid(subgroup) * in_seq
       ][s1 != 0, s2 := rleid(s1)]