R中的多维数组乘法

时间:2016-09-01 12:25:19

标签: r multidimensional-array

我想执行一些复杂的多维数组乘法,其中乘以数组的特定边距

考虑这个例子,我在群体的某些边缘普遍存在分组特征(A和B):

# setup data

random=runif(4)

group.prevalence <- aperm (array(c(random,1-random),
                  dim=c(2,2,2), 
                  dimnames=list(age=c("young","old"),
                                gender=c("male","female"),
                                group=c("A","B"))) , c(3,1,2) )

group.prevalence 
# A + B = 1

现在假设我有一群感兴趣的人......

population <- round(array(runif(4, min=100,max=200) %o% c(1,1*(1+random[1]),1*(1+random[1])^2), 
                          dim=c(2,2,3), dimnames=list(age=c("young","old"),
                                                      gender=c("male","female"),
                                                      year=c("year1","year2","year3"))))

population

...我想计算“A”和“B”的流行程度。

糟糕的解决方案是在循环中填写所有内容:

# bad solution
grouped.population <- array(NA, dim=c(2,2,2,3), 
                            dimnames=list(group=c("A","B"),
                                          age=c("young","old"),
                                          gender=c("male","female"),
                                          year=c("year1","year2","year3")))

for (group in c("A","B"))
  for(gender in c("male","female"))
    for (age in c("young","old")) 
      grouped.population[group,age,gender,] <- group.prevalence[group,age,gender] * population[age,gender,]

但我认为某种应用可能会派上用场,可能是plyr's aly,因为结果的尺寸应该保留。我试过了:

library(plyr)
aaply(population, c(1,2), function(x) x * group.prevalence)
# too many dimensions

我欢迎任何建议。

1 个答案:

答案 0 :(得分:1)

对于您的特定情况,您可以计算:

out <- rep(group.prevalence, times=last(dim(population))) * 
       rep(population, each=first(dim(group.prevalence)))

然后您可以设置此array的尺寸:

array(out, dim=c(2,2,2,3), 
      dimnames=list(group=c("A","B"),
                    age=c("young","old"),
                    gender=c("male","female"),
                    year=c("year1","year2","year3")))

关键是通过维度转置扩展/复制 对齐两个数组的维度,以填充缺少的维度在另一个数组中。一般来说,程序是:

  1. 确定相交的维度。这是(age,gender)
  2. 对于乘法的左侧参数group.prevalence,置换维度(使用aperm),以便所有非交叉维度(即group)都是第一个。然后,复制该数组N次(使用times),其中N是右侧参数的非相交维度(即year)的大小, population
  3. 对于乘法的右侧参数population,置换维度,使所有非交叉维度(即year)最后。然后,复制数组M次的每个元素(使用each),其中M是左手非交叉维度(即group)的大小方论,group.prevalence
  4. 然后只是(数组)相乘,这是矢量化和快速的。
  5. 结果的关节维度只是左侧参数的非交叉维度,后跟交叉维度,后跟右侧非交叉维度(即,(group, age, gender, year))。然后,您可以根据需要在输出中置换这些尺寸,以获得您想要的效果。
  6. 作为支票:

    # bad solution
    grouped.population <- array(NA, dim=c(2,2,2,3), 
                                dimnames=list(group=c("A","B"),
                                              age=c("young","old"),
                                              gender=c("male","female"),
                                              year=c("year1","year2","year3")))
    
    for (group in c("A","B"))
      for(gender in c("male","female"))
        for (age in c("young","old")) 
          grouped.population[group,age,gender,] <- group.prevalence[group,age,gender] * population[age,gender,]
    
    # another approach
    grouped.population2 <- array(rep(group.prevalence, times=last(dim(population))) * 
                                 rep(population, each=first(dim(group.prevalence))), 
                                 dim=c(2,2,2,3), 
                                 dimnames=list(group=c("A","B"),
                                               age=c("young","old"),
                                               gender=c("male","female"),
                                               year=c("year1","year2","year3")))
    
    # check
    all.equal(grouped.population,grouped.population2)
    ##[1] TRUE
    

    更新了基准:

    library(microbenchmark)
    
    f1 <- function(group.prevalence, population) {
      grouped.population <- array(NA, dim=c(2,2,2,3), 
                                  dimnames=list(group=c("A","B"),
                                                age=c("young","old"),
                                                gender=c("male","female"),
                                                year=c("year1","year2","year3")))
      for (group in c("A","B")) {
        for(gender in c("male","female")) {
          for (age in c("young","old")) {
            grouped.population[group,age,gender,] <- group.prevalence[group,age,gender] * population[age,gender,]}}}
    }
    
    f2 <- function(group.prevalence, population) {
      grouped.population2 <- array(rep(group.prevalence, times=last(dim(population))) * 
                                   rep(population, each=first(dim(group.prevalence))), 
                                   dim=c(2,2,2,3), 
                                   dimnames=list(group=c("A","B"),
                                                 age=c("young","old"),
                                                 gender=c("male","female"),
                                                 year=c("year1","year2","year3")))
    }
    
    print(microbenchmark(f1(group.prevalence, population)))
    ##Unit: microseconds
    ##                             expr     min      lq     mean   median      uq     max neval
    ## f1(group.prevalence, population) 101.473 103.998 149.2562 106.8865 115.372 1185.32   100
    print(microbenchmark(f2(group.prevalence, population)))
    ##Unit: microseconds
    ##                             expr    min     lq     mean median      uq     max neval
    ## f2(group.prevalence, population) 66.392 67.672 70.19873 68.454 69.4205 173.284   100
    

    我相信随着每个尺寸的尺寸和尺寸的增加,性能会更加分散。