Question

给出以下示例数据表：

df.rename(columns=df.iloc[0])

我想选择

按组library(data.table) DT <- fread("grp y exclude a 1 0 a 2 0 a 3 0 a 4 1 a 5 0 a 7 1 a 8 0 a 9 0 a 10 0 b 1 0 b 2 0 b 3 0 b 4 1 b 5 0 b 6 1 b 7 1 b 8 0 b 9 0 b 10 0 c 5 1 d 1 0")
包含grp
，在分组中每行前后最多两行。
但仅限3.具有y==5的行。

假设每个组最多有一行exclude==0，这将产生1.-3的所需结果：

y==5

但是，我如何合并4.以便我得到

idx <- -2:2 # 2 rows before match, the matching row itself, and two rows after match
(row_numbers <- DT[,.I[{
                         x <- rep(which(y==5),each=length(idx))+idx 
                         x[x>0 & x<=.N]
                       }], by=grp]$V1)
# [1]  3  4  5  6  7 12 13 14 15 16 20
DT[row_numbers]
#     grp y exclude
#  1:   a 3       0
#  2:   a 4       1
#  3:   a 5       0 # y==5 + two rows before and two rows after
#  4:   a 7       1
#  5:   a 8       0
#  6:   b 3       0
#  7:   b 4       1
#  8:   b 5       0 # y==5 + two rows before and two rows after
#  9:   b 6       1
# 10:   b 7       1
# 11:   c 5       1 # y==5 + nothing, because the group has only 1 element

？感觉就像我很近，但我想我现在在# grp y exclude # 1: a 2 0 # 2: a 3 0 # 3: a 5 0 # 4: a 8 0 # 5: a 9 0 # 6: b 2 0 # 7: b 3 0 # 8: b 5 0 # 9: b 8 0 # 10: b 9 0 # 11: c 5 1和head看起来太长了，所以我会感谢一些新的想法。

Answer 1

更简单一点：

DT[DT[, rn := .I][exclude==0 | y==5][, rn[abs(.I - .I[y==5]) <= 2], by=grp]$V1]

 #   grp y exclude rn
 #1:   a 2       0  2
 #2:   a 3       0  3
 #3:   a 5       0  5
 #4:   a 8       0  7
 #5:   a 9       0  8
 #6:   b 2       0 11
 #7:   b 3       0 12
 #8:   b 5       0 14
 #9:   b 8       0 17
#10:   b 9       0 18
#11:   c 5       1 20

Answer 2

你非常接近。这应该这样做：

row_numbers <- DT[exclude==0 | y==5, .I[{
    x <- rep(which(y==5), each=length(idx)) + idx 
    x[x>0 & x<=.N]
  }], by=grp]$V1
DT[row_numbers]

data.table：在＆amp;之前选择n个特定行在符合条件的其他行之后

2 个答案: