要求r函数在全局环境中使用对象而不是对象的副本

时间:2017-11-21 02:56:55

标签: r function memory large-data

我对R的理解(也许是无知的)是函数在内存中创建对象的临时副本。即使全局环境中的对象太大而无法复制,我如何仍然使用函数来简化代码?或者,推荐的做法是仅对大对象的必要部分进行子集,以便操作函数?

实施例

# load two objects with 10 million rows and 500 columns
big.object.1 <- readRDS(file = "previously.created.dataframe.1")
big.object.2 <- readRDS(file = "previously.created.dataframe.2")

# method 1 with memory use of ~xMB?
big.object.1$recoded.column <- ifelse(big.object.1$old.column > 0,
                                      big.object.1$recoded.column * 2,
                                      big.object.1$recoded.column * 0.5)

# method 2 with memory use of ~2xMB?
new.column_function <- function(data, old.col, recoded.col) {
  data[recoded.col] <- ifelse(data[old.col] > 0,
                                data[recoded.col] * 2,
                                data[recoded.col] * 0.5)
}

new.column_function(data = big.object.1, 
                    recoded.col = 400, 
                    new.col = 401)

当代码复杂而没有函数但内存是函数问题时,最佳做法是什么?如何避免复制大型对象?

2 个答案:

答案 0 :(得分:0)

您可以使用<<-(而非<-

访问功能中的全局环境中的对象
new.column_function <- function(old.col, recoded.col) {

  ind <- big.object[old.col] > 0

  # Do this 
  big.object[recoded.col] <<- ifelse(ind, big.object[recoded.col][ind] * 2,
                                      big.object[recoded.col][ind] * 0.5) 

  # OR do this
  big.object[recoded.col][ind] <<- big.object[recoded.col][ind] * 2
  big.object[recoded.col][!ind] <<- big.object[recoded.col][!ind] * 0.5

  # Don't think this behaves in the intended way...
  #         ifelse(big.object[old.col] > 0,
  #                                big.object[recoded.col] * 2,
  #                                big.object[recoded.col] * 0.5)
}

我不知道这是否比使用data.table更好。

答案 1 :(得分:0)

您可以尝试使用函数getassign仅将objekt的名称赋予函数而不是整个对象

new.column_function <- function(nameOfData, old.col, recoded.col) {
  get(nameOfData)[recoded.col] <- ifelse(data[old.col] > 0,
                                data[recoded.col] * 2,
                                data[recoded.col] * 0.5)
}

在这种情况下,nameOfData是一个值为big.object.1的字符串,例如