从数据表中获取子集的行数

时间:2016-10-25 13:46:19

标签: r data.table

dt= data.table(ID1=rep("A",10), 
               ID2=rep("B",10), 
                sig=c(0,1,0,0,0,0,1,0,-1,0))
dt=rbind(dt,dt)
 dt
    ID1 ID2 sig
 1:   A   B   0
 2:   A   B   1
 3:   A   B   0
 4:   A   B   0
 5:   A   B   0
 6:   A   B   0
 7:   A   B   1
 8:   A   B   0
 9:   A   B  -1
10:   A   B   0
11:   A   B   0
12:   A   B   1
13:   A   B   0
14:   A   B   0
15:   A   B   0
16:   A   B   0
17:   A   B   1
18:   A   B   0
19:   A   B  -1
20:   A   B   0

我想将sig = 1提取到sig = -1并从sig = -1提取到sig = 1。 输出应如下所示:

output1=dt[2:9]
output2=dt[9:12]
 2:   A   B   1
 3:   A   B   0
 4:   A   B   0
 5:   A   B   0
 6:   A   B   0
 7:   A   B   1
 8:   A   B   0
 9:   A   B  -1

我需要的最终输出是

cluster1=dim(output1)[1]
cluster2=dim(output2)[1]

我有近5000个这样的行,我需要从中提取数据块。任何正确方向的指针都会有所帮助

1 个答案:

答案 0 :(得分:3)

我可能会......

wDT = dt[.(sig = c(-1,1)), on="sig", .(w = .I), by=.EACHI]
setorder(wDT, w)

#    sig  w
# 1:   1  2
# 2:   1  7
# 3:  -1  9
# 4:   1 12
# 5:   1 17
# 6:  -1 19

switchDT = wDT[, .SD[1L], by=.(g = rleid(sig))]

#    g  w
# 1: 1  2
# 2: 2  9
# 3: 3 12
# 4: 4 19

所以(感谢@DavidArenburg进行简化)......

switchDT[, diff(w) + 1L ]
# [1] 8 4 8

大卫建议的一种较短的方法是跳过创建switchDT

wDT[!duplicated(rleid(sig)), diff(w) + 1L ]