我有一个功能:1)加载一些大型CSV文件2)处理这些数据集,然后3)将它们放入列表中并返回列表对象。看起来像这样
library(data.table)
load_data <- function(){
# Load the data
foo <- data.table(iris) # really this: foo <- fread("foo.csv")
bar <- data.table(mtcars) # really this: bar <- fread("bar.csv")
# Clean the data
foo[, Foo := mean(Sepal.Length) + median(bar$carb)]
# ... lots of code here
# Put datasets into a list
datasets <- list(foo = foo[], bar = bar[])
# Return result
return(datasets)
}
我担心的是,当我构建列表对象时,我将所需的内存加倍,因为我基本上是为每个数据集创建一个副本。
(e.g. datasets <- list(foo = fread("foo.csv"), bar = fread("bar.csv")))
加载到列表中,但这是不希望的,因为代码会变得冗长且混乱,并不断使用datasets$foo
和datasets$bar
。答案 0 :(得分:3)
您可能想在这里查看memory usage in R上Hadley的资源,但作为一个快速说明:
library(pryr)
mem_used()
#> 36.1 MB
foo <- iris
bar <- mtcars
mem_used() # Loading the datasets into objects requires some memory
#> 36.4 MB
foo["Foo"] <- mean(foo$Sepal.Length) + median(bar$carb)
mem_used()
#> 36.6 MB # Modifying requires some more memory
foo_list <- list(foo)
mem_used() # Adding to the list doesn't really (it's a few bytes)
#> 36.6 MB
由reprex package(v0.2.0)于2018-08-03创建。