dt= data.table(ID1=rep("A",10),
ID2=rep("B",10),
sig=c(0,1,0,0,0,0,1,0,-1,0))
dt=rbind(dt,dt)
dt ID1 ID2 sig 1: A B 0 2: A B 1 3: A B 0 4: A B 0 5: A B 0 6: A B 0 7: A B 1 8: A B 0 9: A B -1 10: A B 0 11: A B 0 12: A B 1 13: A B 0 14: A B 0 15: A B 0 16: A B 0 17: A B 1 18: A B 0 19: A B -1 20: A B 0
我想将sig = 1提取到sig = -1并从sig = -1提取到sig = 1。 输出应如下所示:
output1=dt[2:9]
output2=dt[9:12]
2: A B 1 3: A B 0 4: A B 0 5: A B 0 6: A B 0 7: A B 1 8: A B 0 9: A B -1
我需要的最终输出是
cluster1=dim(output1)[1]
cluster2=dim(output2)[1]
我有近5000个这样的行,我需要从中提取数据块。任何正确方向的指针都会有所帮助
答案 0 :(得分:3)
我可能会......
wDT = dt[.(sig = c(-1,1)), on="sig", .(w = .I), by=.EACHI]
setorder(wDT, w)
# sig w
# 1: 1 2
# 2: 1 7
# 3: -1 9
# 4: 1 12
# 5: 1 17
# 6: -1 19
switchDT = wDT[, .SD[1L], by=.(g = rleid(sig))]
# g w
# 1: 1 2
# 2: 2 9
# 3: 3 12
# 4: 4 19
所以(感谢@DavidArenburg进行简化)......
switchDT[, diff(w) + 1L ]
# [1] 8 4 8
大卫建议的一种较短的方法是跳过创建switchDT
:
wDT[!duplicated(rleid(sig)), diff(w) + 1L ]