Question

我正在尝试重写一些旧代码，以便提高效率。我在我的地方读到使用apply应该比使用for循环更快，所以我试图这样做。首先是旧的工作代码：

dl=data.frame(replicate(16,1:15685849))
#in line below mean was sums, but this gave integer overflows. This is not the case in the real dataset, but for the purpose of this example mean will do.
sums<-mapply(mean, dl[,4:ncol(dl)], USE.NAMES=FALSE)
appel<-dl[,1:3]
for (i in 1:(ncol(dl)-3)){
  appel[,i+3]=dl[,i+3]/sums[i]
}

到目前为止没有问题。我试图将此代码重写为函数，因此我可以将R包用于私人用途。这是我的尝试

dl=data.frame(replicate(16,1:15685849))
depthnormalise=function(tonormtable, skipleftcol=3){
    sums<-mapply(mean, dl[,4:ncol(dl)], USE.NAMES=FALSE)
    dn=function(x){x/sums}
    tonormtable[,(skipleftcol+1):ncol(tonormtable)]=t(apply(tonormtable[,(skipleftcol+1):ncol(tonormtable)], 1, dn))
}
appel=depthnormalise(dl)

但是这会让我失去记忆。

我几乎没有使用apply的经验，但是我似乎无法正确地找到一个表格，我希望保留前3列并只改变之后的那些。如果需要更多信息，请在downvoting之前告诉我们！如果你只是投票，我就不会好起来。

Answer 1

这是一个有效的apply解决方案：

appel1 <- as.matrix(dl)
appel1[, -(1:3)] <- apply(appel1[, -(1:3)], 2, 
                          function(x) round(x / mean(x) * 1e6, digits=2))
all.equal(as.matrix(appel), appel1)
#[1] TRUE

然而，正如评论中所说，它不会比编写良好的for循环快。我的系统速度较慢。

尝试执行任务时耗尽内存，而不是在R中执行

1 个答案: