Question

我已经使用lineprof包分析了我的代码，并确定瓶颈在三个函数perm.stat.list，G.hat和emp.FDR中。共同的主题似乎是根据分析器的输出使用(s)apply。

下面是我的函数的简化版本，以及生成包含三个函数的可重现示例的代码。我添加了注释，以更好地解释每个函数的功能以及所需的输入。

我想大大提高我的代码速度，因为即使使用B=10，该过程也将花费近半个小时的计算时间。输入采用大矩阵（10000 x 10000），因此速度很重要。理想情况下，我想运行B=5000排列，这也会增加计算时间。

任何改进我的代码的技巧都将受到赞赏。

### Functions ###
    perm.stat.list <- function(samp.dat,N1,N2,B){
      perm.list = NULL
      for (b in 1:B){
        #Permute the row "labels", preserving information across columns
        perm.dat.tmp = samp.dat[sample(nrow(samp.dat)),]

        #Compute the permutation-based test statistics
        #Need to save each (1 x M) permutation vector into a list
        perm.list[[b]] = apply(perm.dat.tmp,2,function(y) t.test(y[1:N1],y[(N1+1):(N1+N2)])$statistic)
      }
      return(perm.list)
    }

    G.hat = function(perm.mat,t){
      #Number of permutations
      B = nrow(perm.mat)
      #Compute an empirical distribution along each COLUMN of permutation matrix
      out = apply(perm.mat,2,function(x) sum(x>t,na.rm = TRUE))/B
      return(out)
    }

    emp.FDR <- function(t.vec,mat){
      #For each value in t.vec, apply G.hat function
      out = sapply(t.vec,function(i) sum(G.hat(mat,i),na.rm = TRUE)/max(sum(t.vec > i,na.rm = TRUE),1))
      return(out)
    }

。

### Generate reproducible example ###

### Global variables ###
#Sample sizes (rows)
N1=3000
N2=7000
#Number of columns
M = 10000
#Number of permutations
B = 10

### Data ###
set.seed(1)
X1 = matrix(rnorm(N1*M),ncol=M)
X2 = matrix(rnorm(N2*M),ncol=M)

### Combine data in one large matrix of size (N1+N2) rows x M columns ###
samp.dat = rbind(X1,X2)

### Compute statistic for each column of samp.dat ###
t.stats = apply(samp.dat,2,
               function(x) t.test(x[1:N1],x[(N1+1):(N1+N2)])$statistic)

### Sort t.stats in decreasing order (not necessarily needed for computation) ###
t.vec = sort(t.stats,decreasing=TRUE)

### Permutation matrix based on the data ###
perm.mat = perm.stat.list(samp.dat=samp.dat,N1=N1,N2=N2,B=B)

eFDR = emp.FDR(t.vec=t.vec,mat=perm.mat)

加快涉及（s）申请的功能

0 个答案: