data.table:在&之前选择n个特定行在符合条件的其他行之后

时间:2017-03-15 22:22:52

标签: r data.table

给出以下示例数据表:

df.rename(columns=df.iloc[0])

我想选择

  1. 按组library(data.table) DT <- fread("grp y exclude a 1 0 a 2 0 a 3 0 a 4 1 a 5 0 a 7 1 a 8 0 a 9 0 a 10 0 b 1 0 b 2 0 b 3 0 b 4 1 b 5 0 b 6 1 b 7 1 b 8 0 b 9 0 b 10 0 c 5 1 d 1 0")
  2. 包含grp
  3. 的所有行
  4. ,在分组中每行前后最多两行。
  5. 但仅限3.具有y==5的行。
  6. 假设每个组最多有一行exclude==0,这将产生1.-3的所需结果:

    y==5

    但是,我如何合并4.以便我得到

    idx <- -2:2 # 2 rows before match, the matching row itself, and two rows after match
    (row_numbers <- DT[,.I[{
                             x <- rep(which(y==5),each=length(idx))+idx 
                             x[x>0 & x<=.N]
                           }], by=grp]$V1)
    # [1]  3  4  5  6  7 12 13 14 15 16 20
    DT[row_numbers]
    #     grp y exclude
    #  1:   a 3       0
    #  2:   a 4       1
    #  3:   a 5       0 # y==5 + two rows before and two rows after
    #  4:   a 7       1
    #  5:   a 8       0
    #  6:   b 3       0
    #  7:   b 4       1
    #  8:   b 5       0 # y==5 + two rows before and two rows after
    #  9:   b 6       1
    # 10:   b 7       1
    # 11:   c 5       1 # y==5 + nothing, because the group has only 1 element
    

    ?感觉就像我很近,但我想我现在在# grp y exclude # 1: a 2 0 # 2: a 3 0 # 3: a 5 0 # 4: a 8 0 # 5: a 9 0 # 6: b 2 0 # 7: b 3 0 # 8: b 5 0 # 9: b 8 0 # 10: b 9 0 # 11: c 5 1 head看起来太长了,所以我会感谢一些新的想法。

2 个答案:

答案 0 :(得分:6)

更简单一点:

DT[DT[, rn := .I][exclude==0 | y==5][, rn[abs(.I - .I[y==5]) <= 2], by=grp]$V1]

 #   grp y exclude rn
 #1:   a 2       0  2
 #2:   a 3       0  3
 #3:   a 5       0  5
 #4:   a 8       0  7
 #5:   a 9       0  8
 #6:   b 2       0 11
 #7:   b 3       0 12
 #8:   b 5       0 14
 #9:   b 8       0 17
#10:   b 9       0 18
#11:   c 5       1 20

答案 1 :(得分:5)

你非常接近。这应该这样做:

row_numbers <- DT[exclude==0 | y==5, .I[{
    x <- rep(which(y==5), each=length(idx)) + idx 
    x[x>0 & x<=.N]
  }], by=grp]$V1
DT[row_numbers]