使用因子水平与子集数据集的组合

时间:2013-10-04 23:54:46

标签: r

我有数据集,并希望使用特定因子的所有可能组合将其子集化为多个数据集,从而使每个组合应包含4个级别并显示一次。

这里有一些代码可以生成一个非常简单的例子:

  data<-cbind(rep(1:8,each=2),matrix(nrow=16, ncol=4,rnorm(160, mean = 0, sd = 1)))
  colnames(data)<-LETTERS[1:5]

> data
     "A"          B          C          D           E
 [1,] 1 -0.07929477 -1.2946058 -1.4072064  0.57159386
 [2,] 1  1.83963909 -1.1723990  1.1232986  0.39483666
 [3,] 2 -0.36423210  1.3240148  1.3274450 -0.96929628

 [14,] 7  1.46756745 -0.7885119 -0.4218986 -1.25255228
 [15,] 8 -0.42291051  0.2915121  0.4320183  1.37582031
 [16,] 8 -0.40031215  0.4627476 -0.4145012  0.28700559

“A”是8个级别的因素,我想从8个级别中选择所有可能的4个组合(即1 2 3 4,1 2 3 5等)并使用这些组合将“数据”拆分为多个数据用于进一步分析的集合。

1 个答案:

答案 0 :(得分:2)

你走了:

## Generate all combinations of 4 integers between 1 and 8
ii <- combn(1:8, 4, simplify=FALSE)

## Use those combinations to pick out desired rows in data
x <- lapply(ii, function(II) data[data[,"A"] %in% II, ])

## Check that it worked
x[[1]]
#      A         B           C          D          E
# [1,] 1 2.7963535 -1.01141834  0.9133376 -1.3128354
# [2,] 1 1.9346950  0.85907646 -0.2222619 -0.8143439
# [3,] 2 2.2966139 -2.43140014 -0.4276004  0.4425973
# [4,] 2 0.9046734 -0.30193977 -0.1641523  1.2068400
# [5,] 3 0.8836684  2.59911207 -0.4339402  0.8922918
# [6,] 3 0.9004662  0.31611677  0.9300422 -0.4947400
# [7,] 4 1.0590443 -0.70879715 -0.2357002  1.0907113
# [8,] 4 1.6175373 -0.02734472  0.9151199 -0.8994856

x[[70]]
#      A          B          C           D           E
# [1,] 5  1.2375211 -0.8635894 -0.32504939 -0.38956232
# [2,] 5  1.0631257  1.7598401 -0.36029628  1.34065065
# [3,] 6  0.4014502 -0.9167007 -0.37284132  0.90406595
# [4,] 6  1.3352802 -1.4181380  0.27940665 -0.73645846
# [5,] 7  0.3892974  1.8418089  0.39443361  0.10841747
# [6,] 7  0.2152083 -0.4404339 -1.72481747 -0.03888857
# [7,] 8 -1.8517170  0.3844379 -0.04383212  1.02553227
# [8,] 8 -0.6770360 -2.0134745  1.71437731 -0.49894527