我很难尝试在R中对data.table
(包)进行子集化。给出以下示例
library(data.table)
x = c(rep("a", 6), rep("b", 5))
y = c(0,2,1,0,1,2, 0,1,0,2,1)
z = c(1:6,1:5) + rnorm(11, 0.02, 0.1)
DT = data.table(ind = x, cond = y, dist = z)
ind cond dist
[1,] a 0 1.078966
[2,] a 2 1.987159
[3,] a 1 3.143391
[4,] a 0 3.937058
[5,] a 1 5.037681
[6,] a 2 6.036432
[7,] b 0 1.057809
[8,] b 1 2.144755
[9,] b 0 3.010903
[10,] b 2 3.937765
[11,] b 1 4.976273
我希望在1
列中的第一个cond
之后对所有内容进行子集化。换句话说,3.143391
的{{1}}和a
2.144755
的所有内容都大于b
。
DT.sub <- DT[cond == "1",] # Please, combine this row
DT.sub[,.SD[dist==min(dist)],by=ind] # With this to make the code shorter, if you can.
ind cond dist
[1,] a 1 3.143391
[2,] b 1 2.144755
结果应如下所示:
ind cond dist
[1,] a 0 3.937058
[2,] a 1 5.037681
[3,] a 2 6.036432
[4,] b 0 3.010903
[5,] b 2 3.937765
[6,] b 1 4.976273
答案 0 :(得分:3)
怎么样:
DT[,.SD[seq(match(1,cond)+1,.N)],by=ind]
ind cond dist
[1,] a 0 3.937058
[2,] a 1 5.037681
[3,] a 2 6.036432
[4,] b 0 3.010903
[5,] b 2 3.937765
[6,] b 1 4.976273
不过,首先set.seed(1)
是好的,所以我们可以使用相同的随机数据。