假设我有一个2000列的大矩阵(matrix_1)。每个单元格的值为0或1.我想找到10列的最佳组合。最佳组合给出每行的最大非0值。所以它基本上给出了最大值
sum (apply (matrix_2, 1, function(x) any(x == 1)))
我无法通过所有可能的组合,因为它太计算密集(有2.758988e + 26)。有什么建议吗?
举个例子,拿这个矩阵它有4行,我一次只挑选2列
mat <- matrix (c(1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0), nrow = 4, byrow = FALSE)
mat
# combination of columns 2 and 3 is best: 3 rows with at least a single 1 value
sum (apply (mat[, c(2, 3)], 1, function(x) any (x == 1)))
# combination of columns 1 and 2 is worse: 2 rows with at least a single 1 value
sum (apply (mat[, c(1, 2)], 1, function(x) any (x == 1)))
答案 0 :(得分:0)
你可以使用这样的函数......
find10 <- function(mat,n=10){
cols <- rep(FALSE,ncol(mat)) #columns to exclude
rows <- rep(TRUE,nrow(mat)) #rows to include
for(i in 1:n){
colsums <- colSums(mat[rows,])
colsums[cols] <- -1 #to exclude those already accounted for
maxcol <- which.max(colsums)
cols[maxcol] <- TRUE
rows <- rows & !as.logical(mat[,maxcol])
}
return(which(cols))
}
它查找具有大多数非零的列,从比较中删除这些行,然后重复。它返回n个最佳列的列号。
一个例子......
m <- matrix(sample(0:1,100,prob = c(0.8,0.2),replace=TRUE),nrow=10)
m
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 0 1 0 0 0 0 0 1 1 0
[2,] 1 0 0 0 0 0 0 0 1 1
[3,] 0 0 0 0 1 0 0 0 0 0
[4,] 0 0 0 1 0 1 0 1 0 1
[5,] 0 0 0 0 1 0 0 1 0 0
[6,] 0 0 0 1 0 1 1 0 0 0
[7,] 0 0 1 0 0 0 0 0 0 0
[8,] 0 0 0 0 0 0 0 0 1 0
[9,] 0 0 0 0 0 0 0 1 0 0
[10,] 0 0 0 0 0 0 0 0 0 0
find10(m,5)
[1] 3 4 5 8 9
对于您提供的示例,它也会出现2,3
。