我有67个数据框的列表。每个数据框都有两列,第一列命名为“Material”,第二列命名为月份和年份。每个数据框对于第一列具有相同的名称,但没有两个数据帧具有第二列的相同名称。
R is connected to the H2O cluster:
H2O cluster uptime: 49 days 22 hours
H2O cluster version: 3.9.1.99999
H2O cluster name: H2O_29276
H2O cluster total nodes: 1
H2O cluster total memory: 17.29 GB
H2O cluster total cores: 24
H2O cluster allowed cores: 4
H2O cluster healthy: TRUE
H2O Connection ip:
H2O Connection port:
H2O Connection proxy: NA
R Version: R version 3.2.3 (2015-12-10)
R version 3.2.3 (2015-12-10)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.4 (Santiago)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets
[6] methods base
other attached packages:
[1] h2o_3.8.2.6 statmod_1.4.24 pacman_0.4.1
loaded via a namespace (and not attached):
[1] tools_3.2.3 RCurl_1.95-4.8 jsonlite_0.9.20
[4] bitops_1.0-6
这个列表的大小是5.2Mb,所以它实际上并不是很大,但出于某些原因,当我这样做时:
> head(str(fy16_list))
List of 67
$ April_FY11 :'data.frame': 1559 obs. of 2 variables:
..$ Material : chr [1:1559] "622-5129-105" "622-5129-109" "622-5129-203" "622-5129-223" ...
..$ April_FY11: chr [1:1559] "1 " NA "(3)" NA ...
$ April_FY12 :'data.frame': 1721 obs. of 2 variables:
..$ Material : chr [1:1721] "622-5129-021" "622-5129-105" "622-5129-109" "622-5129-203" ...
..$ April_FY12: chr [1:1721] NA NA NA NA ...
$ April_FY13 :'data.frame': 1189 obs. of 2 variables:
..$ Material : chr [1:1189] "122000-F15SA_1" "122000-F15SA_2" "987-9705-001" "822-1867-001" ...
..$ April_FY13: chr [1:1189] NA NA "-15" "15" ...
我等待5-10分钟,然后收到一条错误消息,说我已达到内存限制!
mydf <- Reduce(function(...) merge(..., all=T), mylist)
我创建的数据框大小为8Gb!我不知道为什么会这样。我试过了
Error: cannot allocate vector of size 88.7 Mb
Warning messages:
1: In `[.data.frame`(x, c(m$xi, if (all.x) m$x.alone), c(by.x, ... :
Reached total allocation of 8078Mb: see help(memory.size)
2: In `[.data.frame`(x, c(m$xi, if (all.x) m$x.alone), c(by.x, ... :
Reached total allocation of 8078Mb: see help(memory.size)
# The warning message repeats 12 times...
但同样的事情发生了。
当我做
时,一切顺利mydf <- reshape::merge_all(mylist)
所以我认为这段代码不是可扩展的,但老实说5.2Mb看起来很小,因为我之前使用的是超过300Mb的列表。
有任何关于让它发挥作用的建议吗?