我有数据集,并希望使用特定因子的所有可能组合将其子集化为多个数据集,从而使每个组合应包含4个级别并显示一次。
这里有一些代码可以生成一个非常简单的例子:
data<-cbind(rep(1:8,each=2),matrix(nrow=16, ncol=4,rnorm(160, mean = 0, sd = 1)))
colnames(data)<-LETTERS[1:5]
> data
"A" B C D E
[1,] 1 -0.07929477 -1.2946058 -1.4072064 0.57159386
[2,] 1 1.83963909 -1.1723990 1.1232986 0.39483666
[3,] 2 -0.36423210 1.3240148 1.3274450 -0.96929628
[14,] 7 1.46756745 -0.7885119 -0.4218986 -1.25255228
[15,] 8 -0.42291051 0.2915121 0.4320183 1.37582031
[16,] 8 -0.40031215 0.4627476 -0.4145012 0.28700559
“A”是8个级别的因素,我想从8个级别中选择所有可能的4个组合(即1 2 3 4,1 2 3 5等)并使用这些组合将“数据”拆分为多个数据用于进一步分析的集合。
答案 0 :(得分:2)
你走了:
## Generate all combinations of 4 integers between 1 and 8
ii <- combn(1:8, 4, simplify=FALSE)
## Use those combinations to pick out desired rows in data
x <- lapply(ii, function(II) data[data[,"A"] %in% II, ])
## Check that it worked
x[[1]]
# A B C D E
# [1,] 1 2.7963535 -1.01141834 0.9133376 -1.3128354
# [2,] 1 1.9346950 0.85907646 -0.2222619 -0.8143439
# [3,] 2 2.2966139 -2.43140014 -0.4276004 0.4425973
# [4,] 2 0.9046734 -0.30193977 -0.1641523 1.2068400
# [5,] 3 0.8836684 2.59911207 -0.4339402 0.8922918
# [6,] 3 0.9004662 0.31611677 0.9300422 -0.4947400
# [7,] 4 1.0590443 -0.70879715 -0.2357002 1.0907113
# [8,] 4 1.6175373 -0.02734472 0.9151199 -0.8994856
x[[70]]
# A B C D E
# [1,] 5 1.2375211 -0.8635894 -0.32504939 -0.38956232
# [2,] 5 1.0631257 1.7598401 -0.36029628 1.34065065
# [3,] 6 0.4014502 -0.9167007 -0.37284132 0.90406595
# [4,] 6 1.3352802 -1.4181380 0.27940665 -0.73645846
# [5,] 7 0.3892974 1.8418089 0.39443361 0.10841747
# [6,] 7 0.2152083 -0.4404339 -1.72481747 -0.03888857
# [7,] 8 -1.8517170 0.3844379 -0.04383212 1.02553227
# [8,] 8 -0.6770360 -2.0134745 1.71437731 -0.49894527