我正在尝试采用元素最大的两个矩阵“矩阵”(稀疏矩阵)。我已经尝试了pmax(...)
函数,它似乎适用于两个“正常”矩阵,但是当我传入两个稀疏矩阵时,它在R 2.15上给出了以下错误。
library(Matrix)
# Loading required package: lattice
v=Matrix(0,100,100); v[1,1]=1;
x=v
pmax(v,x)
# Error in pmax(v, x) : (list) object cannot be coerced to type 'logical'
# In addition: Warning message:
# In any(nas) : coercing argument of type 'list' to logical
答案 0 :(得分:8)
您发现pmax
不支持稀疏矩阵。原因是因为cbind
不支持稀疏矩阵。 Matrix
的作者撰写了cBind
,相当于cbind
。如果您在pmax
函数中更改了一行,则它可以正常工作:
pmax.sparse=function (..., na.rm = FALSE)
{
elts <- list(...)
if (length(elts) == 0L)
stop("no arguments")
if (all(vapply(elts, function(x) is.atomic(x) && !is.object(x),
NA))) {
mmm <- .Internal(pmax(na.rm, ...))
}
else {
mmm <- elts[[1L]]
attr(mmm, "dim") <- NULL
has.na <- FALSE
for (each in elts[-1L]) {
attr(each, "dim") <- NULL
l1 <- length(each)
l2 <- length(mmm)
if (l2 < l1) {
if (l2 && l1%%l2)
warning("an argument will be fractionally recycled")
mmm <- rep(mmm, length.out = l1)
}
else if (l1 && l1 < l2) {
if (l2%%l1)
warning("an argument will be fractionally recycled")
each <- rep(each, length.out = l2)
}
# nas <- cbind(is.na(mmm), is.na(each))
nas <- cBind(is.na(mmm), is.na(each)) # Changed row.
if (has.na || (has.na <- any(nas))) {
mmm[nas[, 1L]] <- each[nas[, 1L]]
each[nas[, 2L]] <- mmm[nas[, 2L]]
}
change <- mmm < each
change <- change & !is.na(change)
mmm[change] <- each[change]
if (has.na && !na.rm)
mmm[nas[, 1L] | nas[, 2L]] <- NA
}
}
mostattributes(mmm) <- attributes(elts[[1L]])
mmm
}
pmax.sparse(x,v)
# Works fine.
答案 1 :(得分:7)
试试这个。它连接矩阵summary
输出,然后在(i, j)
对分组后获取最大值。从某种意义上说,它可以进行任何类型的元素操作,只需用您选择的函数替换max
(或者编写一个带有FUN
参数的通用函数)。
pmax.sparse <- function(..., na.rm = FALSE) {
# check that all matrices have conforming sizes
num.rows <- unique(sapply(list(...), nrow))
num.cols <- unique(sapply(list(...), ncol))
stopifnot(length(num.rows) == 1)
stopifnot(length(num.cols) == 1)
cat.summary <- do.call(rbind, lapply(list(...), summary))
out.summary <- aggregate(x ~ i + j, data = cat.summary, max, na.rm)
sparseMatrix(i = out.summary$i,
j = out.summary$j,
x = out.summary$x,
dims = c(num.rows, num.cols))
}
如果您的矩阵太大而且不够稀疏,以至于此代码对于您的需求而言太慢,我会考虑使用data.table
的类似方法。
以下是一个应用示例:
N <- 1000000
n <- 10000
M1 <- sparseMatrix(i = sample(N,n), j = sample(N,n), x = runif(n), dims = c(N,N))
M2 <- sparseMatrix(i = sample(N,n), j = sample(N,n), x = runif(n), dims = c(N,N))
M3 <- sparseMatrix(i = sample(N,n), j = sample(N,n), x = runif(n), dims = c(N,N))
system.time(p <- pmax.sparse(M1,M2,M3))
# user system elapsed
# 2.58 0.06 2.65
另一个提议的解决方案失败了:
Error in .class1(object) :
Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 106
答案 2 :(得分:1)
修改flodel的答案(不能直接评论答案),通过使用data.table包加快大型矩阵的计算。
使用original,flodel&lt; version:
运行> object.size(m1)
# 131053304 bytes
> dim(m1)
# [1] 8031286 39
> object.size(m2)
# 131053304 bytes
> dim(m2)
# [1] 8031286 39
> system.time(pmax.sparse(m1, m2))
# user system elapsed
# 326.253 21.805 347.969
将cat.summary,out.summary和结果矩阵的计算修改为:
cat.summary <- rbindlist(lapply(list(...), summary)) # that's data.table
out.summary <- cat.summary[, list(x = max(x)), by = c("i", "j")]
sparseMatrix(i = out.summary[,i],
j = out.summary[,j],
x = out.summary[,x],
dims = c(num.rows, num.cols))
运行修改版本:
> system.time(pmax.sparse(m1, m2))
# user system elapsed
# 21.546 0.049 21.589