通过try()之类的命令超时R命令

时间:2011-10-25 14:44:31

标签: r

我正在并行运行大量迭代。某些迭代比其他迭代花费更多(比如说100倍)。我想把它们计算出来,但我宁愿不必深入研究函数背后的C代码(称之为fun.c)来做繁重的工作。我希望有一些类似于try()但有一个time.out选项。然后我可以做类似的事情:

for (i in 1:1000) {
    try(fun.c(args),time.out=60))->to.return[i]
}

因此,如果fun.c在某个迭代中花费的时间超过60秒,则修改后的try()函数会将其杀死并返回警告或其他内容。

有人有什么建议吗?提前谢谢。

4 个答案:

答案 0 :(得分:26)

请参阅此主题:http://r.789695.n4.nabble.com/Time-out-for-a-R-Function-td3075686.html

{p}和?evalWithTimeout包中的R.utils

以下是一个例子:

require(R.utils)

## function that can take a long time
fn1 <- function(x)
{
    for (i in 1:x^x)
    {
        rep(x, 1000)
    }
    return("finished")
}

## test timeout
evalWithTimeout(fn1(3), timeout = 1, onTimeout = "error") # should be fine
evalWithTimeout(fn1(8), timeout = 1, onTimeout = "error") # should timeout

答案 1 :(得分:13)

这听起来应该是应该通过向工作人员发送任务来管理的东西,而不是应该包含在工作线程中的东西。 multicore包支持某些功能的超时;据我所知,snow没有。

编辑:如果你真的非常渴望在工作线程中有这个,那么试试这个功能,灵感来自@ jthetzel的答案中的链接。

try_with_time_limit <- function(expr, cpu = Inf, elapsed = Inf)
{
  y <- try({setTimeLimit(cpu, elapsed); expr}, silent = TRUE) 
  if(inherits(y, "try-error")) NULL else y 
}

try_with_time_limit(sqrt(1:10), 1)                   #value returns as normal
try_with_time_limit(for(i in 1:1e7) sqrt(1:10), 1)   #returns NULL

您可能希望在超时时自定义行为。目前它只返回NULL

答案 2 :(得分:4)

您在评论中提到您的问题是C代码运行时间过长。根据我的经验,基于setTimeLimit / evalWithTimeout的纯粹基于R的超时解决方案都不能阻止C代码的执行,除非代码提供了中断到R的机会。

您还在评论中提到您正在对SNOW进行并行化。如果要并行化的计算机是支持分叉的操作系统(即不是Windows),则可以在命令的上下文中使用mcparallel(在parallel包中,派生自multicore) SNOW集群上的节点;逆也是真的BTW,你可以从multicore分叉的上下文触发SNOW集群。如果您没有通过SNOW进行并行化,如果需要超时C代码的机器可以分叉,那么(当然)这个答案也是成立的。

这适用于eval_fork opencpu使用的解决方案。查看eval_fork函数的正文下方,了解Windows中的黑客大纲以及该黑客的半版本。

eval_fork <- function(..., timeout=60){

  #this limit must always be higher than the timeout on the fork!
  setTimeLimit(timeout+5);      

  #dispatch based on method
  ##NOTE!!!!! Due to a bug in mcparallel, we cannot use silent=TRUE for now.
  myfork <- parallel::mcparallel({
    eval(...)
  }, silent=FALSE);

  #wait max n seconds for a result.
  myresult <- parallel::mccollect(myfork, wait=FALSE, timeout=timeout);

  #try to avoid bug/race condition where mccollect returns null without waiting full timeout.
  #see https://github.com/jeroenooms/opencpu/issues/131
  #waits for max another 2 seconds if proc looks dead 
  while(is.null(myresult) && totaltime < timeout && totaltime < 2) {
     Sys.sleep(.1)
     enddtime <- Sys.time();
     totaltime <- as.numeric(enddtime - starttime, units="secs")
     myresult <- parallel::mccollect(myfork, wait = FALSE, timeout = timeout);
  }

  #kill fork after collect has returned
  tools::pskill(myfork$pid, tools::SIGKILL);    
  tools::pskill(-1 * myfork$pid, tools::SIGKILL);  

  #clean up:
  parallel::mccollect(myfork, wait=FALSE);

  #timeout?
  if(is.null(myresult)){
    stop("R call did not return within ", timeout, " seconds. Terminating process.", call.=FALSE);      
  }

  #move this to distinguish between timeout and NULL returns
  myresult <- myresult[[1]];

  #reset timer
  setTimeLimit();     

  #forks don't throw errors themselves
  if(inherits(myresult,"try-error")){
    #stop(myresult, call.=FALSE);
    stop(attr(myresult, "condition"));
  }

  #send the buffered response
  return(myresult);  
}

Windows hack: 原则上,特别是对于SNOW中的工作节点,您可以通过拥有工作节点来完成类似的事情:

  1. 创建一个存储临时文件的变量
  2. 将他们的工作区(save.image)存储到已知位置
  3. 使用系统调用加载Rscript一个R脚本,该脚本加载节点保存的工作空间,然后保存结果(基本上是对R工作空间进行慢速内存分支)。
  4. 在每个工作节点上输入一个重复循环,查找结果文件,如果结果文件在设定的时间段后没有显示,则从循环中断并保存反映超时的返回值
  5. 否则,请成功完成外观并阅读保存的结果并准备好返回
  6. 我在很长一段时间之前为使用慢速内存副本的本地主机上的Windows上的mcparallel写了一些代码。我现在会以完全不同的方式写出它,但它可能会给你一个开始的地方,所以无论如何我都提供它。有些人需要注意,russmisc是我写的一个包,现在作为repsych在github上。 glibraryrepsych中的一个函数,如果软件包已经不可用,则会安装软件包(如果您的SNOW不在本地主机上,则可能很重要)。 ...当然,我还没有将这段代码用于/ years /,而且我最近还没有对它进行过测试 - 我共享的版本可能包含我在以后的版本中解决的错误

    # Farm has been banished here because it likely violates 
    # CRAN's rules in regards to where it saves files and is very
    # windows specific.  Also, the darn thing is buggy.
    
    #' Create a farm
    #'
    #' A farm is an external self-terminating instance of R to solve a time consuming problem in R.  
    #' Think of it as a (very) poor-person's multi-core.
    #' For a usage example, see checkFarm.
    #' Known issues:  May have a problem if the library gdata has been loaded.//
    #' If a farm produces warnings or errors you won't see them
    #' If a farm produces an error... it never will produce a result.
    #'
    #' @export
    #' @param commands A text string of commands including line breaks to run.  
    #' This must include the result being saved in the object farmName in the file farmResult (both are variables provided by farm() to the farm).
    #' @param farmName This is the name of the farm, used for creating and destroying filenames.  One is randomly assigned that is plausibly unique.
    #' @param Rloc The location of R.exe.  The default loads the version of R that is stored in the windows registry as being \"current\".
    #' @return The farm name is returned to be stored in an object and then used in checkFarm()
    #' @seealso \code{\link{checkFarm}} \code{\link{waitForFarm}}
    farm <- function(commands,farmName=paste("farm-",as.integer(Sys.time())+runif(1),sep=""),Rloc = NULL)
    {
      if (is.null(Rloc)) {Rloc <- paste('\"',readRegistry(paste("Software\\R-core\\R\\",readRegistry("Software\\R-core\\R\\",maxdepth=100)$`Current Version`,"\\",sep=""))$InstallPath,"\\bin",sep="")}
      Rloc <- paste(Rloc,"\\R.exe\"",sep="")
      farmRda <- paste(farmName,".Rda",sep="")
        farmRda.int <- paste(farmName,".int.Rda",sep="") #internal .Rda
        farmR <- paste(farmName,".R",sep="")
        farmResult <- paste(farmName,".res.Rda",sep="") #result .Rda
        unlink(c(farmRda,farmR,farmResult,farmRda.int))
        farmwd <- getwd()
        cat("setwd(\"",farmwd,"\")\n",file=farmR,append=TRUE,sep="")
        #loading the internals to get them, then loading the globals, then reloading the internals to make sure they have haven't been overwritten
      cat("
    load(\"",farmRda.int,"\")
    load(farmRda)
    load(\"",farmRda.int,"\")
            ",file=farmR,append=TRUE,sep="")
        cat("library(russmisc)\n",file=farmR,append=TRUE)
        cat("glibrary(",paste(c(names(sessionInfo()$loadedOnly),names(sessionInfo()$otherPkgs)),collapse=","),")\n",file=farmR,append=TRUE)
        cat(commands,file=farmR,append=TRUE)
        cat("
            unlink(farmRda)
            unlink(farmRda.int)
        ",file=farmR,append=TRUE,sep="")
        save(list = ls(all.names=TRUE,envir=.GlobalEnv), file = farmRda,envir=.GlobalEnv)
        save(list = ls(all.names=TRUE), file = farmRda.int)
        #have to drop the escaped quotes for file.exists to find the file
      if (file.exists(gsub('\"','',Rloc))) {
            cmd <- paste(Rloc," --file=",getwd(),"/",farmR,sep="")
        } else {
            stop(paste("Error in russmisc:farm: Unable to find R.exe at",Rloc))
        }
        print(cmd)
        shell(cmd,wait=FALSE)
        return(farmName)
    }
    NULL
    
    #' Check a farm
    #'
    #' See farm() for details on farms.  This function checks for a file based on the farmName parameter called farmName.res.Rda.
    #' If that file exists it loads it and returns the object stored by the farm in the object farmName.  If that file does not exist,
    #' then the farm is not done processing, and a warning and NULL are returned.  Note that a rapid loop through checkFarm() without Sys.sleep produced an error during development.
    #'
    #' @export
    #' @param farmName This is the name of the farm, used for creating and destroying filenames.  This should be saved from when the farm() is created
    #' @seealso \code{\link{farm}} \code{\link{waitForFarm}}
    #' @examples 
    #' #Example not run
    #' #.tmp <- "This is a test of farm()"
    #' #exampleFarm <- farm("
    #' #print(.tmp)
    #' #helloFarm <- 10+2
    #' #farmName <- helloFarm
    #' #save(farmName,file=farmResult)
    #' #")
    #' #example.result <- checkFarm(exampleFarm)
    #' #while (is.null(example.result)) {
    #' #    example.result <- checkFarm(exampleFarm)
    #' #    Sys.sleep(1)
    #' #}
    #' #print(example.result)
    checkFarm <- function(farmName) {
      farmResult <- paste(farmName,".res.Rda",sep="")
      farmR <- paste(farmName,".r",sep="")
      if (!file.exists(farmR)) {
        message(paste("Warning in russmisc:checkFarm:  There is no evidence that the farm '",farmName,"' exists (no .r file found).\n",sep=""))
      }
        if (file.exists(farmResult)) {
            load(farmResult)
        unlink(farmResult) #delete the farmResult file
        unlink(farmR)      #delete the script file
            return(farmName)
        } else {
            warning(paste("Warning in russmisc:checkFarm:  The farm '",farmName,"' is not ready.\n",sep=""))
            return(invisible(NULL))
        }
    }
    NULL
    
    #' Wait for a farm result
    #'
    #' This function repeatedly checks for a farm, when the farm is found it returns the harvest (the farm result object).
    #' If the farm terminated with an error or there is some other sort of coding error, waitForFarm will be an infinate loop. As
    #' \code{checkFarm} produces errors on checks when the harvest is not ready, waitForFarm hides these errors in the factory error-catching wrapper.
    #'
    #' @export
    #' @param farmName This is the name of the farm, used for creating and destroying filenames.  This should be saved from when the farm() is created
    #' @param noCheck If this value is TRUE the check for the farm's .r is skipped.  If it is FALSE, the existance of the appropriate .r is checked for before entering a potentially unending while loop.
    waitForFarm <- function(farmName,noCheck=FALSE) {
      f.checkFarm <- factory(checkFarm)
      farmR <- paste(farmName,".r",sep="")
      if (!file.exists(farmR) & !noCheck) {
        stop(paste("Error in russmisc:checkFarm:  There is no evidence that the farm '",farmName,"' exists (no .r file found).\n",sep=""))
      }
      repeat {
        harvest <- f.checkFarm(farmName)
        if (!is.null(harvest[[1]])) {break}
        Sys.sleep(1)
      }
        return(harvest[[1]])
    }
    NULL
    
    #' Create a one-line simple farm
    #'
    #' This is a convience wrapper function that uses farm to create a single farm appropriate for processing single line commands.
    #'
    #' @export
    #' @param command A single command
    #' @param farmName This is the name of the farm, used for creating and destroying filenames.  One is randomly assigned that is plausibly unique.
    #' @param Rloc The location of R.exe.  The default loads the version of R that is stored in the windows registry as being \"current\".
    #' @return The farm name is returned to be stored in an object and then used in checkFarm()
    #' @seealso \code{\link{farm}}, \code{\link{checkFarm}}, and \code{\link{waitForFarm}}
    #' @examples
    #' #Example not run
    #' #a <- 5
    #' #b <- 10
    #' #farmID <- simpleFarm("a + b")
    #' #waitForFarm(farmID)
    simpleFarm <- function(command,farmName=paste("farm-",as.integer(Sys.time())+runif(1),sep=""),Rloc = NULL) {
      return(farm(paste("farmName <- (",command,");save(farmName,file=farmResult)",collapse=""),farmName=paste("farm-",as.integer(Sys.time())+runif(1),sep=""),Rloc = NULL))
    }
    NULL
    

答案 3 :(得分:0)

我喜欢R.utils::withTimeout(),但我也希望尽量避免依赖软件包。这是基于R的解决方案。请注意on.exit()调用。即使您的表达式抛出错误,它也可以确保删除时间限制。

with_timeout <- function(expr, cpu, elapsed){
  expr <- substitute(expr)
  envir <- parent.frame()
  setTimeLimit(cpu = cpu, elapsed = elapsed, transient = TRUE)
  on.exit(setTimeLimit(cpu = Inf, elapsed = Inf, transient = FALSE))
  eval(expr, envir = envir)
}