我想执行一些复杂的多维数组乘法,其中乘以数组的特定边距。
考虑这个例子,我在群体的某些边缘普遍存在分组特征(A和B):
# setup data
random=runif(4)
group.prevalence <- aperm (array(c(random,1-random),
dim=c(2,2,2),
dimnames=list(age=c("young","old"),
gender=c("male","female"),
group=c("A","B"))) , c(3,1,2) )
group.prevalence
# A + B = 1
现在假设我有一群感兴趣的人......
population <- round(array(runif(4, min=100,max=200) %o% c(1,1*(1+random[1]),1*(1+random[1])^2),
dim=c(2,2,3), dimnames=list(age=c("young","old"),
gender=c("male","female"),
year=c("year1","year2","year3"))))
population
...我想计算“A”和“B”的流行程度。
糟糕的解决方案是在循环中填写所有内容:
# bad solution
grouped.population <- array(NA, dim=c(2,2,2,3),
dimnames=list(group=c("A","B"),
age=c("young","old"),
gender=c("male","female"),
year=c("year1","year2","year3")))
for (group in c("A","B"))
for(gender in c("male","female"))
for (age in c("young","old"))
grouped.population[group,age,gender,] <- group.prevalence[group,age,gender] * population[age,gender,]
但我认为某种应用可能会派上用场,可能是plyr's aly,因为结果的尺寸应该保留。我试过了:
library(plyr)
aaply(population, c(1,2), function(x) x * group.prevalence)
# too many dimensions
我欢迎任何建议。
答案 0 :(得分:1)
对于您的特定情况,您可以计算:
out <- rep(group.prevalence, times=last(dim(population))) *
rep(population, each=first(dim(group.prevalence)))
然后您可以设置此array
的尺寸:
array(out, dim=c(2,2,2,3),
dimnames=list(group=c("A","B"),
age=c("young","old"),
gender=c("male","female"),
year=c("year1","year2","year3")))
关键是通过维度转置和扩展/复制 对齐两个数组的维度,以填充缺少的维度在另一个数组中。一般来说,程序是:
(age,gender)
。group.prevalence
,置换维度(使用aperm
),以便所有非交叉维度(即group
)都是第一个。然后,复制该数组N
次(使用times
),其中N
是右侧参数的非相交维度(即year
)的大小, population
。population
,置换维度,使所有非交叉维度(即year
)最后。然后,复制数组M
次的每个元素(使用each
),其中M
是左手非交叉维度(即group
)的大小方论,group.prevalence
。(group, age, gender, year)
)。然后,您可以根据需要在输出中置换这些尺寸,以获得您想要的效果。作为支票:
# bad solution
grouped.population <- array(NA, dim=c(2,2,2,3),
dimnames=list(group=c("A","B"),
age=c("young","old"),
gender=c("male","female"),
year=c("year1","year2","year3")))
for (group in c("A","B"))
for(gender in c("male","female"))
for (age in c("young","old"))
grouped.population[group,age,gender,] <- group.prevalence[group,age,gender] * population[age,gender,]
# another approach
grouped.population2 <- array(rep(group.prevalence, times=last(dim(population))) *
rep(population, each=first(dim(group.prevalence))),
dim=c(2,2,2,3),
dimnames=list(group=c("A","B"),
age=c("young","old"),
gender=c("male","female"),
year=c("year1","year2","year3")))
# check
all.equal(grouped.population,grouped.population2)
##[1] TRUE
更新了基准:
library(microbenchmark)
f1 <- function(group.prevalence, population) {
grouped.population <- array(NA, dim=c(2,2,2,3),
dimnames=list(group=c("A","B"),
age=c("young","old"),
gender=c("male","female"),
year=c("year1","year2","year3")))
for (group in c("A","B")) {
for(gender in c("male","female")) {
for (age in c("young","old")) {
grouped.population[group,age,gender,] <- group.prevalence[group,age,gender] * population[age,gender,]}}}
}
f2 <- function(group.prevalence, population) {
grouped.population2 <- array(rep(group.prevalence, times=last(dim(population))) *
rep(population, each=first(dim(group.prevalence))),
dim=c(2,2,2,3),
dimnames=list(group=c("A","B"),
age=c("young","old"),
gender=c("male","female"),
year=c("year1","year2","year3")))
}
print(microbenchmark(f1(group.prevalence, population)))
##Unit: microseconds
## expr min lq mean median uq max neval
## f1(group.prevalence, population) 101.473 103.998 149.2562 106.8865 115.372 1185.32 100
print(microbenchmark(f2(group.prevalence, population)))
##Unit: microseconds
## expr min lq mean median uq max neval
## f2(group.prevalence, population) 66.392 67.672 70.19873 68.454 69.4205 173.284 100
我相信随着每个尺寸的尺寸和尺寸的增加,性能会更加分散。