我有两个大的稀疏矩阵(比如A和B)。我想基于B矩阵在 A 中将非零元素替换为零,其中B矩阵包含在每列中排名的所有单个元素的排名。我的输出矩阵应该包含来自 A 矩阵的前n个和后n个排序元素,并且所有其他非零值应该等于零。
以下是我的方法。我在函数GetTopNBottomN中使用循环,我想知道它是否可以优化,因为矩阵变大需要很长时间。
#input matrix
TestMatrix = Matrix(c(0.80,0.9,0.6,0,0,0.3,0.5,
0,0,0.3,0,0,0,0,
0.4,0.5,0.6,0,0,0.1,0,
0,0,0,0,0,0,0,
0.3,0.4,0.5,0.2,0.1,0.7,0.8,
0.6,0.7,0.5,0.8,0,0,0),7,sparse = TRUE)
#function to genrate ranks across all the columns for the input matrix
GenerateRankMatrix <- function(aMatrix){ ## Function Begins
n <- diff(aMatrix@p) ## number of non-zeros per column
lst <- split(aMatrix@x, rep.int(1:ncol(aMatrix), n)) ## columns to list
r <- unlist(lapply(lapply(lst,function(x) x * -1), rank)) ## column-wise ranking and result collapsing
RankMatrix <- aMatrix ## copy sparse matrix
RankMatrix@x <- r ## replace non-zero elements with rank
return(RankMatrix)
} # Function Ends
## Function to retain Top N and Bottom N records
GetTopNBottomN <- function(aMatrix,rMatrix){
#aMatrix = original SparseMatrix, rMatrix = RankMatrix
n = 2 ## Top 2 and Bottom 2 Elements across all columns
for(j in 1:ncol(aMatrix)){
MaxValue = max(rMatrix[,j])
if(MaxValue <= 2 * n) next ##Ignore the column if there are less than or equal to 2*n nonzero values
for (i in 1: nrow(aMatrix)){
if(rMatrix[i,j] >n & rMatrix[i,j] <= MaxValue-n){ #IF Cond
aMatrix[i,j] = 0
} #IF ends
}
}
return(aMatrix)
}
#Output
RankMatrix = GenerateRankMatrix(TestMatrix) #Genrate Rank Matrix
#Output Matrix
GetTopNBottomN(TestMatrix,RankMatrix)
答案 0 :(得分:3)
我提取了非零元素的索引,并使用ave()
来计算分组排名
idx <- which(TestMatrix != 0, arr.ind=TRUE)
ranks = ave(-TestMatrix[idx], idx[,2], FUN=rank)
或实际上是您想要的结果,要保留的值
keep = ave(-TestMatrix[idx], idx[,2], FUN=function(elt) {
elt = rank(elt)
(elt > 2) & (elt <= length(elt) - 2)
}) == 0
idx = idx[keep,]
然后创建一个新的稀疏矩阵
sparseMatrix(idx[,1], idx[,2], x=TestMatrix[idx])