我有一个由两列'group'和'value'组成的数据框。我想在每组中选择值'4'后最多三行。如果在下一组开始之前少于3行,则只选择0/1/2行。
理想情况下,我会得到某种值为1 / 0s或True / Falses的向量,以指示我是否选择了该行。
有什么想法吗?
mydf= structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("a",
"b"), class = "factor"), value = c(6, 5, 4, 6, 1, 4, 1, 4, 6,
6, 7, 3, 7, 4, 7, 5, 7, 3, 2, 4)), .Names = c("group", "value"
), row.names = c(NA, -20L), class = c("data.table", "data.frame"
), .internal.selfref = <pointer: 0x102805578>);
mydf
答案 0 :(得分:2)
最好还显示预期结果。可能会有所帮助。
library(data.table)
mydf[, indx:= cumsum(value==4) , group][, flag:= if (indx!=0) 1:.N %in%
2:4 else FALSE, list(group, indx)][, indx:=NULL][]
# group value flag
#1: a 6 FALSE
#2: a 5 FALSE
#3: a 4 FALSE
#4: a 6 TRUE
#5: a 1 TRUE
#6: a 4 FALSE
#7: a 1 TRUE
#8: a 4 FALSE
#9: a 6 TRUE
#10: a 6 TRUE
#11: b 7 FALSE
#12: b 3 FALSE
#13: b 7 FALSE
#14: b 4 FALSE
#15: b 7 TRUE
#16: b 5 TRUE
#17: b 7 TRUE
#18: b 3 FALSE
#19: b 2 FALSE
#20: b 4 FALSE