子集data.table更改因子级别之间的子集限制

时间:2012-06-27 12:10:56

标签: r subset data.table

我很难尝试在R中对data.table(包)进行子集化。给出以下示例

library(data.table)

x = c(rep("a", 6), rep("b", 5))
y = c(0,2,1,0,1,2, 0,1,0,2,1)
z = c(1:6,1:5) + rnorm(11, 0.02, 0.1)

DT = data.table(ind = x, cond = y, dist = z)

      ind cond     dist
 [1,]   a    0 1.078966
 [2,]   a    2 1.987159
 [3,]   a    1 3.143391
 [4,]   a    0 3.937058
 [5,]   a    1 5.037681
 [6,]   a    2 6.036432
 [7,]   b    0 1.057809
 [8,]   b    1 2.144755
 [9,]   b    0 3.010903
[10,]   b    2 3.937765
[11,]   b    1 4.976273

我希望在1列中的第一个cond之后对所有内容进行子集化。换句话说,3.143391的{​​{1}}和a 2.144755的所有内容都大于b

DT.sub <- DT[cond == "1",] # Please, combine this row
DT.sub[,.SD[dist==min(dist)],by=ind] # With this to make the code shorter, if you can.

  ind cond     dist
[1,]   a    1 3.143391
[2,]   b    1 2.144755

结果应如下所示:

      ind cond     dist
 [1,]   a    0 3.937058
 [2,]   a    1 5.037681
 [3,]   a    2 6.036432
 [4,]   b    0 3.010903
 [5,]   b    2 3.937765
 [6,]   b    1 4.976273

1 个答案:

答案 0 :(得分:3)

怎么样:

DT[,.SD[seq(match(1,cond)+1,.N)],by=ind]
     ind cond     dist 
[1,]   a    0 3.937058 
[2,]   a    1 5.037681 
[3,]   a    2 6.036432 
[4,]   b    0 3.010903 
[5,]   b    2 3.937765 
[6,]   b    1 4.976273 
不过,首先set.seed(1)是好的,所以我们可以使用相同的随机数据。