我有一个大的稀疏矩阵1M X 10(1百万行和10列),我想查看矩阵中的每一行的值,并根据它创建一个新的向量。以下是我的代码。我想知道是否有任何方法可以优化它。
CreatenewVector <- function(TestMatrix){
newColumn = c()
for(i in 1:nrow(TestMatrix)){ ## Loop begins
Value = ifelse(1 %in% TestMatrix[i,],1,0)
newColumn = c(newColumn,Value)
} ##Loop ends
return(newColumn)
}
## SampleInput: TestMatrix = matrix(c(1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0), byrow = T, nrow = 4)
## Sampleoutput: = (1,1,1,0)
## In the input TestMatrix, each vector represents a row. for instance (1,0,0) is the first row and so on.
答案 0 :(得分:4)
假设您使用的是普通matrix
对象,而不是特殊的稀疏矩阵类,则应使用rowSums
。
rowSums(x == 1) > 0
如果x
是您的矩阵的名称。这将返回logical
向量,如果您更喜欢1/0为true / false,则可以轻松强制numeric
as.numeric()
。
为了给出一些时间感,我首先使用千行矩阵进行基准测试,然后使用一百万行矩阵:
gregor = function(x) {as.numeric(rowSums(x == 1L) > 0L)}
# original method in question
op1 = function(x){
newColumn = c()
for(i in 1:nrow(x)){ ## Loop begins
Value = ifelse(1 %in% x[i,],1,0)
newColumn = c(newColumn,Value)
} ##Loop ends
return(newColumn)
}
# modified original:
# eliminated unnecessary ifelse
# pre-allocated result vector (no growing in a loop!)
# saved numeric conversion to the end
op2 = function(x){
newColumn = logical(nrow(x))
for(i in 1:nrow(x)){ ## Loop begins
newColumn[i] = 1L %in% x[i,]
} ##Loop ends
return(as.numeric(newColumn))
}
bouncy = function(x) {
as.numeric(apply(x, 1, function(y) any(y == 1L)))
}
以下是千行矩阵的结果:
n = 1e3
x = matrix(sample(c(0L, 1L), size = n, replace = T), ncol = 4)
microbenchmark(gregor(x), op1(x), op2(x), bouncy(x), times = 20)
# Unit: microseconds
# expr min lq mean median uq max neval cld
# gregor(x) 12.164 15.7750 20.14625 20.1465 24.8980 30.410 20 a
# op1(x) 1224.736 1258.9465 1345.46110 1275.6715 1338.0105 2002.075 20 d
# op2(x) 846.140 864.7655 935.46740 886.2425 951.4325 1287.075 20 c
# bouncy(x) 439.795 453.8595 496.96475 486.5495 508.0260 711.199 20 b
使用rowSums
是明显的赢家。我从一百万行矩阵的下一次测试中删除了OP1
:
n = 1e6
x = matrix(sample(c(0L, 1L), size = n, replace = T), ncol = 4)
microbenchmark(gregor(x), op2(x), bouncy(x), times = 30)
# Unit: milliseconds
# expr min lq mean median uq max neval cld
# gregor(x) 9.371777 10.02862 12.55963 10.61343 14.13236 27.70671 30 a
# op2(x) 822.171523 856.68916 937.23602 881.39219 1028.26738 1183.68569 30 c
# bouncy(x) 391.604590 412.51063 502.61117 502.02431 588.78785 656.18824 30 b
相对保证金更有利于rowSums
。