我想创建一个矩阵(A),其中元素是另一个矩阵(B)的每四行的平均值。例如,矩阵A中第1行的元素应该是矩阵B中第1行到第4行的平均值。目前我已经使用了一个循环函数来获得它,但是矩阵的大小非常大,这使得循环很有时间耗时。我想知道是否有更好的方法可以做到这一点。这是一个例子
B = matrix(runif(10000, 0, 10), 100, 100)
A = matrix(0, floor(dim(B)[1]/4), dim(B)[2])
for (im in 1: floor(dim(B)[1]/4)){
A[im, ] = colMeans(as.matrix(B[c((((im - 1)*4) + 1):(im*4)), ]))
}
答案 0 :(得分:5)
您可以使用具有rowsum
方法(默认为')的matrix
函数轻松地对此进行矢量化,并且可以按组计算总和。然后,除以4以获得均值
grps <- floor(dim(B)[1]/4)
rowsum.default(B[1:(grps*4),], rep(1:grps, each = 4), reorder = FALSE)/4
<强>基准强>
由于这是一个优化问题,这里有一些基准,所有提出的方法都不是这么大的数据集
library(zoo)
library(microbenchmark)
set.seed(123)
B <- matrix(runif(100, 0, 10), 10000, 100)
OP <- function(B) {
grps <- floor(dim(B)[1]/4)
A = matrix(0, grps, dim(B)[2])
for (im in 1: grps){
A[im, ] = colMeans(as.matrix(B[c((((im - 1)*4) + 1):(im*4)), ]))
}
A
}
DA <- function(B){
grps <- floor(dim(B)[1]/4)
rowsum.default(B[1:(grps*4),], rep(1:grps, each = 4), reorder = FALSE)/4
}
JB <- function(B) as.matrix(aggregate(B, list(gl(ceiling(nrow(B)/4), 4, nrow(B))), mean)[, -1])
Thela <- function(B) tapply(B, list((row(B)-1) %/% 4,col(B)), FUN=mean)
RollApply <- function(B) rollapply(B, width = 4, by = 4, FUN = mean, by.column = TRUE)
microbenchmark(OP(B), DA(B), JB(B), RollApply(B), Thela(B), times = 10L)
# Unit: milliseconds
# expr min lq mean median uq max neval cld
# OP(B) 45.57121 48.93491 70.17095 55.77107 65.43564 168.7760 10 a
# DA(B) 10.60941 10.87035 11.65232 11.36478 12.07908 14.1551 10 a
# JB(B) 1753.39114 1773.83230 1868.60788 1837.47161 1900.38141 2076.5835 10 b
# RollApply(B) 8946.90359 9009.45160 9380.62408 9294.98441 9450.16426 10922.2595 10 d
# Thela(B) 4820.36079 4925.70055 5117.22822 5048.89781 5257.58619 5650.2391 10 c
毕竟OP的解决方案不是所以坏了。
答案 1 :(得分:3)
您可以使用以下包(zoo)和函数(rollapply)实现此目的。
install.packages("zoo")
require(zoo)
B <- matrix(runif(100, 0, 10),10, 10)
# with for loop
A = matrix(0,floor(dim(B)[1]/4),dim(B)[2])
for (im in 1 : floor(dim(B)[1]/4)){
+ A[im,] = colMeans(as.matrix(B[c((((im-1)*4)+1):(im*4)),]))}
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 5.633970 4.092848 3.793473 5.437288 6.316069 4.714015 5.837214 7.150007 4.638332
[2,] 5.445271 2.024052 6.096939 6.165723 3.049140 4.928087 5.433291 5.674594 4.607373
[,10]
[1,] 5.260153
[2,] 6.589873
# with rowsum @ David
C = grps <- floor(dim(B)[1]/4)
rowsum(B[1:(grps*4),], rep(1:grps, each = 4))/4
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
1 5.633970 4.092848 3.793473 5.437288 6.316069 4.714015 5.837214 7.150007 4.638332
2 5.445271 2.024052 6.096939 6.165723 3.049140 4.928087 5.433291 5.674594 4.607373
[,10]
1 5.260153
2 6.589873
# With rollapply
D = rollapply(B, width = 4, by = 4, FUN = mean, by.column = T)
D
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 5.633970 4.092848 3.793473 5.437288 6.316069 4.714015 5.837214 7.150007 4.638332
[2,] 5.445271 2.024052 6.096939 6.165723 3.049140 4.928087 5.433291 5.674594 4.607373
[,10]
[1,] 5.260153
[2,] 6.589873
答案 2 :(得分:1)
aggregate
也可以这样做,但需要后续胁迫matrix
:
as.matrix(aggregate(B, list(gl(ceiling(nrow(B)/4), 4, nrow(B))), mean)[, -1])
请注意,如果nrow(B)
不是4的倍数,则结果将包含最后一行,其中包含最后nrow(B) %% 4
行的列平均值。
作为indicated by @thelatemail,tapply
可以做一个更简洁的工作:
tapply(B, list((row(B)-1) %/% 4,col(B)), FUN=mean)