Question

我有一个庞大的数据集：

library(gtools)
a<-permutations(2,20,v=c(0,1),repeats.allowed=TRUE)
a<-as.data.frame(a)

我有一个矩阵：

set.seed(123)
b<-replicate(5,sample(1:20,5, replace=T))
b<-t(b)

对于'a'的每一行，我想选择'b'

中每列指定的列

这样做我运行以下内容：

for (i in 1:nrow(a)) sapply(1:N, function(y) a[i,c(as.vector(b[,y]))])

因此，对于'a'的每一行，我需要一个矩阵或数据框，其中包含'a'的所选列

问题是这个过程非常缓慢。我想知道是否有更快的方法来做到这一点。

上面的例子显示了这个过程有多慢。这是一个较小的例子：

 library(gtools)
 a<-permutations(2,5,v=c(0,1),repeats.allowed=TRUE)
 a<-as.data.frame(a)



 set.seed(123)
  b<-replicate(5,sample(1:5,5, replace=T))
  b<-t(b)

这就是我想要的一步一步：

1. select the i-th row in `'a'`
2. select the y-th column in `'b'`

3.select those elements in the first row of `'a'` that are specified by the first column in `'b'`

4. Repeat 2. and 3. until all columns of 'b' have been used.

这是使用：

完成的

sapply(1:N, function(y) a[i,c(as.vector(b[,y]))])

对'a'

这是通过添加for循环来完成的：

for (i in 1:nrow(a)) sapply(1:ncol(b), function(y) a[i,c(as.vector(b[,y]))])

Answer 1

使用较小的a

子集

 a1 <- a[1:22,]
 a2 <- as.matrix(a1[,c(b)])

 res1 <- lapply(split(a2, row(a2)), function(x) { matrix(x,ncol=ncol(b))})

或者将其保留在数组中

 arr1 <- array(t(a2), dim=c(5,5,22))

res1[[22]]
#      [,1] [,2] [,3] [,4] [,5]
#[1,]    0    1    0    1    0
#[2,]    0    0    1    0    0
#[3,]    1    0    0    0    0
#[4,]    1    0    0    0    1
#[5,]    1    0    0    1    0

arr1[,,22]
#      [,1] [,2] [,3] [,4] [,5]
# [1,]    0    1    0    1    0
# [2,]    0    0    1    0    0
# [3,]    1    0    0    0    0
# [4,]    1    0    0    0    1
# [5,]    1    0    0    1    0

从data.frame中选择大的子集

1 个答案: