Question

假设我有一个2000列的大矩阵（matrix_1）。每个单元格的值为0或1.我想找到10列的最佳组合。最佳组合给出每行的最大非0值。所以它基本上给出了最大值

sum (apply (matrix_2, 1, function(x) any(x == 1)))

我无法通过所有可能的组合，因为它太计算密集（有2.758988e + 26）。有什么建议吗？

举个例子，拿这个矩阵它有4行，我一次只挑选2列

mat <- matrix (c(1, 0, 0, 0, 0, 0, 1, 0,  1, 0, 0, 1,  0, 0, 0, 0), nrow = 4,  byrow = FALSE)
mat
# combination of columns 2 and 3 is best: 3 rows with at least a single 1 value
sum (apply (mat[, c(2, 3)], 1, function(x) any (x == 1)))
# combination of columns  1 and 2 is worse: 2 rows with at least a single 1 value
sum (apply (mat[, c(1, 2)], 1, function(x) any (x == 1)))

Answer 1

你可以使用这样的函数......

find10 <- function(mat,n=10){
  cols <- rep(FALSE,ncol(mat)) #columns to exclude
  rows <- rep(TRUE,nrow(mat)) #rows to include
  for(i in 1:n){
    colsums <- colSums(mat[rows,])
    colsums[cols] <- -1 #to exclude those already accounted for
    maxcol <- which.max(colsums)
    cols[maxcol] <- TRUE
    rows <- rows & !as.logical(mat[,maxcol]) 
  }
  return(which(cols))
}

它查找具有大多数非零的列，从比较中删除这些行，然后重复。它返回n个最佳列的列号。

一个例子......

m <- matrix(sample(0:1,100,prob = c(0.8,0.2),replace=TRUE),nrow=10)

m
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]    0    1    0    0    0    0    0    1    1     0
 [2,]    1    0    0    0    0    0    0    0    1     1
 [3,]    0    0    0    0    1    0    0    0    0     0
 [4,]    0    0    0    1    0    1    0    1    0     1
 [5,]    0    0    0    0    1    0    0    1    0     0
 [6,]    0    0    0    1    0    1    1    0    0     0
 [7,]    0    0    1    0    0    0    0    0    0     0
 [8,]    0    0    0    0    0    0    0    0    1     0
 [9,]    0    0    0    0    0    0    0    1    0     0
[10,]    0    0    0    0    0    0    0    0    0     0

find10(m,5)
[1] 3 4 5 8 9

对于您提供的示例，它也会出现2,3。

找到矩阵中最佳的列组合

1 个答案: