Question

我有这样的计算（请注意，这只是非常简化的简化版本，最小的可复制示例！）：

computation <- function() # simplified version!
{
    # a lot of big matrices here....
    big_matrix <- matrix(rnorm(2000*2000), nrow = 2000, ncol = 2000)

    exp.value <- 4.5
    prior <- function (x) rep(exp.value, nrow(x))

    # after computation, it returns the model
    list(
        some_info = 5.18,
        prior = prior
    )
}

此函数适合并返回一个模型，我想将其保存到磁盘：

m <- computation()
save(m, file = "tmp.Rdata")
file.info("tmp.Rdata")$size
# [1] 30713946

很遗憾，如您所见，该文件太大，因为它包含函数prior()的整个闭包，并且此闭包包含来自computation()函数的所有数据，包括{ {1}}（在我的完整代码中有很多）。

现在，我尝试通过使用big_matrix重新定义先前功能的环境（关闭）来解决此问题：

environment(prior) <- list2env(list(exp.value = exp.value))

这按预期工作！不幸的是，当我将这些清理后的代码放入calculation（）函数中时（实际上，当我将此代码放入任何函数中时），它停止工作了！参见：

exp.value <- 4.5
environment(m$prior) <- list2env(list(exp.value = exp.value))
save(m, file = "tmp.Rdata")
file.info("tmp.Rdata")$size
# [1] 475

文件再次很大，关闭未正确清理。

我不明白这是怎么回事？为什么清理代码在任何函数外运行时起作用，而在函数内时停止工作？
如何使其在函数内部工作？

Answer 1

解决问题的一种方法是在返回之前从环境中删除大变量。

computation <- function() 
{
    big_matrix <- matrix(rnorm(2000*2000), nrow = 2000, ncol = 2000)

    exp.value <- 4.5
    prior <- function (x) rep(exp.value, nrow(x))

    rm(big_matrix) ## remove variable

    list(
        some_info = 5.18,
        prior = prior
    )
}

list2env方法的问题在于，默认情况下，它指向当前环境作为新环境的父环境，因此无论如何您要捕获函数中的所有内容。您可以改为将全局环境指定为基本环境

computation <- function() 
{
  big_matrix <- matrix(rnorm(2000*2000), nrow = 2000, ncol = 2000)

  exp.value <- 4.5
  prior <- function (x) rep(exp.value, nrow(x))
                                                              # explicit parent
  environment(prior) <- list2env(list(exp.value = exp.value), parent=globalenv()) 

  list(
    some_info = 5.18,
    prior = prior
  )
}

（如果您指定emptyenv()，那么您将找不到rep()之类的内置函数）

Answer 2

您可能希望选择保留的内容，而不是按照MrFlick的建议选择要删除的内容，这样可以减少更复杂的代码中出现错误的机会，并且可能不那么冗长。

我喜欢使用on.exit()在函数的主体顶部声明这种动作，因此在阅读代码时很明显闭包的环境是相关的，并且不会干扰其余的代码

computation <- function() # simplified version!
{
  on.exit(rm(list=setdiff(ls(), "exp.value")))

  # a lot of big matrices here....
  big_matrix <- matrix(rnorm(2000*2000), nrow = 2000, ncol = 2000)

  exp.value <- 4.5
  prior <- function (x) rep(exp.value, nrow(x))

  # after computation, it returns the model
  list(
    some_info = 5.18,
    prior = prior
  )
}
m <- computation()
file <- tempfile(fileext = ".Rdata")
save(m, file = file)
file.info(file)$size
#> [1] 2830
m$prior(data.frame(a=1:2))
#> [1] 4.5 4.5

Answer 3

由于您没有使用函数式编程，因此这是R6类的一个很好的用例：

library(R6)
Computation <- R6Class("Computation", list(
  exp.value = NULL,
  prior = function (x) rep(self$exp.value, nrow(x)),
  initialize = function(exp.value) {
    big_matrix <- matrix(rnorm(2000*2000), nrow = 2000, ncol = 2000)
    self$exp.value <- exp.value
  }
))

m <- Computation$new(4.5)
saveRDS(m, file = "/tmp/test.rds")
file.info("/tmp/test.rds")$size
[1] 2585

m$prior(data.frame(1:10))
[1] 4.5 4.5 4.5 4.5 4.5 4.5 4.5 4.5 4.5 4.5

返回并保存时，如何清理函数闭包（环境）？

3 个答案: