我对R的理解(也许是无知的)是函数在内存中创建对象的临时副本。即使全局环境中的对象太大而无法复制,我如何仍然使用函数来简化代码?或者,推荐的做法是仅对大对象的必要部分进行子集,以便操作函数?
实施例
# load two objects with 10 million rows and 500 columns
big.object.1 <- readRDS(file = "previously.created.dataframe.1")
big.object.2 <- readRDS(file = "previously.created.dataframe.2")
# method 1 with memory use of ~xMB?
big.object.1$recoded.column <- ifelse(big.object.1$old.column > 0,
big.object.1$recoded.column * 2,
big.object.1$recoded.column * 0.5)
# method 2 with memory use of ~2xMB?
new.column_function <- function(data, old.col, recoded.col) {
data[recoded.col] <- ifelse(data[old.col] > 0,
data[recoded.col] * 2,
data[recoded.col] * 0.5)
}
new.column_function(data = big.object.1,
recoded.col = 400,
new.col = 401)
当代码复杂而没有函数但内存是函数问题时,最佳做法是什么?如何避免复制大型对象?
答案 0 :(得分:0)
您可以使用<<-
(而非<-
)
new.column_function <- function(old.col, recoded.col) {
ind <- big.object[old.col] > 0
# Do this
big.object[recoded.col] <<- ifelse(ind, big.object[recoded.col][ind] * 2,
big.object[recoded.col][ind] * 0.5)
# OR do this
big.object[recoded.col][ind] <<- big.object[recoded.col][ind] * 2
big.object[recoded.col][!ind] <<- big.object[recoded.col][!ind] * 0.5
# Don't think this behaves in the intended way...
# ifelse(big.object[old.col] > 0,
# big.object[recoded.col] * 2,
# big.object[recoded.col] * 0.5)
}
我不知道这是否比使用data.table
更好。
答案 1 :(得分:0)
您可以尝试使用函数get
和assign
仅将objekt的名称赋予函数而不是整个对象
new.column_function <- function(nameOfData, old.col, recoded.col) {
get(nameOfData)[recoded.col] <- ifelse(data[old.col] > 0,
data[recoded.col] * 2,
data[recoded.col] * 0.5)
}
在这种情况下,nameOfData是一个值为big.object.1
的字符串,例如