选择r中某个标志值后面的行

时间:2015-03-27 22:26:36

标签: r pattern-matching match

我有一个由两列'group'和'value'组成的数据框。我想在每组中选择值'4'后最多三行。如果在下一组开始之前少于3行,则只选择0/1/2行。

理想情况下,我会得到某种值为1 / 0s或True / Falses的向量,以指示我是否选择了该行。

有什么想法吗?

  mydf= structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("a", 
 "b"), class = "factor"), value = c(6, 5, 4, 6, 1, 4, 1, 4, 6, 
 6, 7, 3, 7, 4, 7, 5, 7, 3, 2, 4)), .Names = c("group", "value"
 ), row.names = c(NA, -20L), class = c("data.table", "data.frame"
  ), .internal.selfref = <pointer: 0x102805578>);
 mydf

1 个答案:

答案 0 :(得分:2)

最好还显示预期结果。可能会有所帮助。

 library(data.table)
 mydf[, indx:= cumsum(value==4) , group][, flag:= if (indx!=0) 1:.N %in% 
              2:4 else FALSE, list(group, indx)][, indx:=NULL][]
 #   group value  flag
 #1:     a     6 FALSE
 #2:     a     5 FALSE
 #3:     a     4 FALSE
 #4:     a     6  TRUE
 #5:     a     1  TRUE
 #6:     a     4 FALSE
 #7:     a     1  TRUE
 #8:     a     4 FALSE
 #9:     a     6  TRUE
#10:     a     6  TRUE
#11:     b     7 FALSE
#12:     b     3 FALSE
#13:     b     7 FALSE
#14:     b     4 FALSE
#15:     b     7  TRUE
#16:     b     5  TRUE
#17:     b     7  TRUE
#18:     b     3 FALSE
#19:     b     2 FALSE
#20:     b     4 FALSE