加快涉及(s)申请的功能

时间:2018-11-03 22:07:51

标签: r performance apply sapply

我已经使用lineprof包分析了我的代码,并确定瓶颈在三个函数perm.stat.listG.hatemp.FDR中。共同的主题似乎是根据分析器的输出使用(s)apply

下面是我的函数的简化版本,以及生成包含三个函数的可重现示例的代码。我添加了注释,以更好地解释每个函数的功能以及所需的输入。

我想大大提高我的代码速度,因为即使使用B=10,该过程也将花费近半个小时的计算时间。输入采用大矩阵(10000 x 10000),因此速度很重要。理想情况下,我想运行B=5000排列,这也会增加计算时间。

任何改进我的代码的技巧都将受到赞赏。

### Functions ###
    perm.stat.list <- function(samp.dat,N1,N2,B){
      perm.list = NULL
      for (b in 1:B){
        #Permute the row "labels", preserving information across columns
        perm.dat.tmp = samp.dat[sample(nrow(samp.dat)),]

        #Compute the permutation-based test statistics
        #Need to save each (1 x M) permutation vector into a list
        perm.list[[b]] = apply(perm.dat.tmp,2,function(y) t.test(y[1:N1],y[(N1+1):(N1+N2)])$statistic)
      }
      return(perm.list)
    }

    G.hat = function(perm.mat,t){
      #Number of permutations
      B = nrow(perm.mat)
      #Compute an empirical distribution along each COLUMN of permutation matrix
      out = apply(perm.mat,2,function(x) sum(x>t,na.rm = TRUE))/B
      return(out)
    }

    emp.FDR <- function(t.vec,mat){
      #For each value in t.vec, apply G.hat function
      out = sapply(t.vec,function(i) sum(G.hat(mat,i),na.rm = TRUE)/max(sum(t.vec > i,na.rm = TRUE),1))
      return(out)
    }

### Generate reproducible example ###

### Global variables ###
#Sample sizes (rows)
N1=3000
N2=7000
#Number of columns
M = 10000
#Number of permutations
B = 10

### Data ###
set.seed(1)
X1 = matrix(rnorm(N1*M),ncol=M)
X2 = matrix(rnorm(N2*M),ncol=M)

### Combine data in one large matrix of size (N1+N2) rows x M columns ###
samp.dat = rbind(X1,X2)

### Compute statistic for each column of samp.dat ###
t.stats = apply(samp.dat,2,
               function(x) t.test(x[1:N1],x[(N1+1):(N1+N2)])$statistic)

### Sort t.stats in decreasing order (not necessarily needed for computation) ###
t.vec = sort(t.stats,decreasing=TRUE)

### Permutation matrix based on the data ###
perm.mat = perm.stat.list(samp.dat=samp.dat,N1=N1,N2=N2,B=B)

eFDR = emp.FDR(t.vec=t.vec,mat=perm.mat)

0 个答案:

没有答案