测量功能运行时完成功能的时间

时间:2012-04-12 20:41:28

标签: performance r optimization

我在R中运行一些功能,有时需要很长时间才能完成(从10分钟到4小时不等)。具体来说,我使用的是Rense Nieuwenhuis编写的函数forward.lmer(),可以找到here。我想知道R是否有任何方法告诉%完成操作。特别是,当操作已经运行超过一个小时,我想知道它的完成程度。

是否有一个通用函数可以让我知道任何给定函数的进度? 理想情况下我想知道的是,如果有这样的函数:

percentComplete()
forward.lmer(inputs)

然后会告诉我有关完成该功能的接近程度如何?

我尝试的第一件事是使用library(time)并执行以下操作:

time<-getTime()
function(inputs)
timeReport(time)

但这只是告诉我完成该功能需要多长时间。有没有办法知道函数在运行时如何进展(完成百分比)?

此外,我希望提高此功能的效率,但这是另一个问题。谢谢大家!

1 个答案:

答案 0 :(得分:5)

您可以使用txtProgressBar来跟踪您在某个过程中取得的进展。

我对你所引用的功能不太熟悉,知道它应该去哪里,但仅仅从眼球看,看起来它可能会花费一部分时间在循环中开始:

# Iteratively updating the model with addition of one block of variable(s)
# Also: extracting the loglikelihood of each estimated model
for(j in 1:length(blocks))

如果您要使用:

pb <- txtProgressBar(style=3)
for(j in 1:length(blocks))
  setTxtProgressBar(pb, j/length(blocks))
  ...
}
close(pb)

这可能会给你你想要的东西。请注意,某些样式进度条的某些显示效果比其他样式更好。如果使用我发布的代码输出看起来很有趣,那么在创建进度条时尝试不同的样式可能会有更多的运气。

R无法预先知道通用函数需要多长时间才能完成,因此这里没有通用的答案。这是您在每个循环中使用进度条发布的功能。

forward.lmer <- function(
  start.model, blocks,
  max.iter=1, sig.level=FALSE,
  zt=FALSE, print.log=TRUE)
  {

    # forward.lmer: a function for stepwise regression using lmer mixed effects models
    # Author: Rense Nieuwenhuis

    # Initialysing internal variables
    log.step <- 0
    log.LL <- log.p <- log.block <- zt.temp <- log.zt <- NA
    model.basis <- start.model

    # Maximum number of iterations cannot exceed number of blocks
    if (max.iter > length(blocks)) max.iter <- length(blocks)
      pb <- txtProgressBar(style=3)
      # Setting up the outer loop
      for(i in 1:max.iter)
      {
        #each iteration, update the progress bar.
        setTxtProgressBar(pb, i/max.iter)
        models <- list()

        # Iteratively updating the model with addition of one block of variable(s)
        # Also: extracting the loglikelihood of each estimated model
        for(j in 1:length(blocks))
        {
          models[[j]] <- update(model.basis, as.formula(paste(". ~ . + ", blocks[j])))
        }

        LL <- unlist(lapply(models, logLik))

        # Ordering the models based on their loglikelihood.
        # Additional selection criteria apply
        for (j in order(LL, decreasing=TRUE))
        {

          ##############
          ############## Selection based on ANOVA-test
          ##############

          if(sig.level != FALSE)
          {
            if(anova(model.basis, models[[j]])[2,7] < sig.level)
            {

              model.basis <- models[[j]]

              # Writing the logs
              log.step <- log.step + 1
              log.block[log.step] <- blocks[j]
              log.LL[log.step] <- as.numeric(logLik(model.basis))
              log.p[log.step] <- anova(model.basis, models[[j]])[2,7]

              blocks <- blocks[-j]

              break
            }
          }

          ##############
          ############## Selection based significance of added variable-block
          ##############

          if(zt != FALSE)
          {
            b.model <- summary(models[[j]])@coefs
            diff.par <- setdiff(rownames(b.model), rownames(summary(model.basis)@coefs))
            if (length(diff.par)==0) break
            sig.par <- FALSE

            for (k in 1:length(diff.par))
            {
              if(abs(b.model[which(rownames(b.model)==diff.par[k]),3]) > zt)
              {
                sig.par <- TRUE
                zt.temp <- b.model[which(rownames(b.model)==diff.par[k]),3]
                break
              }
            }

            if(sig.par==TRUE)
            {
              model.basis <- models[[j]]

              # Writing the logs
              log.step <- log.step + 1
              log.block[log.step] <- blocks[j]
              log.LL[log.step] <- as.numeric(logLik(model.basis))
              log.zt[log.step] <- zt.temp
              blocks <- blocks[-j]

              break
            }
          }
        }
  }
  close(pb)

  ## Create and print log
  log.df <- data.frame(log.step=1:log.step, log.block, log.LL, log.p, log.zt)
  if(print.log == TRUE) print(log.df, digits=4)

  ## Return the 'best' fitting model
  return(model.basis)
}