将值赋给R sparse-Matrix

时间:2016-12-01 10:18:37

标签: r sparse-matrix

使用最直接的语法M[,1] <- 0为稀疏矩阵的列分配值在R中出乎意料地慢。遵循https://stat.ethz.ch/pipermail/r-help/2010-December/262365.html的建议,我试图使列从密集存储中无效矩阵的编码(参见nullify.column函数),但在某些情况下甚至更慢,在其他情况下更快,具体取决于列中要取消的非零值的数量:

library(Matrix)

nullify.column <- function(M, i) {
  M.dense <- summary(M)
  filter <- M.dense$j!=i
  return(sparseMatrix(i=M.dense$i[filter], j=M.dense$j[filter], x=M.dense$x[filter]))
}

build.random.sparse.matrix <- function(n,p,q) {
  i <- sample(x = 1:n, replace = T, size = q)
  j <- sample(x = 1:p, replace = T, size = q)
  s <- rnorm(q)^2
  M <- sparseMatrix(i,j,x = s)
  return(M)
}

t0 <- Sys.time()

n <- 1000000
p <- 50000
sparse.ratio <- 0.001
q <- n*p * sparse.ratio
t1 <- Sys.time()
A <- build.random.sparse.matrix(n,p,q)
B <- build.random.sparse.matrix(n,1,q*2) # column to nullify with more non-zero-values
M <- cbind(B,A)
t2 <- Sys.time()
delta <- round(as.numeric(difftime(t2,t1,units="secs")),2)
print(paste(c("Building sparse matrix took ", delta, "s"), collapse=""))

t1 <- Sys.time()
M.bis <- nullify.column(M,1)
t2 <- Sys.time()
delta <- round(as.numeric(difftime(t2,t1,units="secs")),2)
print(paste(c("nullify.column took ", delta, "s"), collapse=""))

t1 <- Sys.time()
M[,1] <- 0
t2 <- Sys.time()
delta <- round(as.numeric(difftime(t2,t1,units="secs")),2)
print(paste(c( "M[,i] <- 0 took ", delta, "s"), collapse=""))

t3 <- Sys.time()
delta <- round(as.numeric(difftime(t3,t0,units="secs")),2)
print(paste(c( "overall time ", delta, "s"), collapse=""))

返回

[1] "Building sparse matrix took 58.86s"
[1] "nullify.column took 31.2s"
[1] "M[,i] <- 0 took 186.55s"
[1] "overall time 278.49s"

有什么想法吗?这个M[,1] <- 0需要比高级SVD算法更长的时间,并且会破坏我的整体表现。

更新

只是为了检查我,尝试了下面的代码

# test 2
M <- cbind(B,A)
N <- t(M)

t1 <- Sys.time()
N[1,] <- 0
t2 <- Sys.time()
delta <- round(as.numeric(difftime(t2,t1,units="secs")),2)
print(paste(c( "t(M)[,i] <- 0 took ", delta, "s"), collapse=""))

返回

[1] "t(M)[,i] <- 0 took 2.05s"

似乎稀疏矩阵是面向行的而不是面向列的,并且使行无效而不是列更快。

0 个答案:

没有答案