我有一个R程序,它结合了10个文件,每个文件大小为296MB,我将内存大小增加到8GB(RAM大小)
--max-mem-size=8192M
当我运行这个程序时,我收到了一个错误
In type.convert(data[[i]], as.is = as.is[i], dec = dec, na.strings = character(0L)) :
Reached total allocation of 7646Mb: see help(memory.size)
这是我的R程序
S1 <- read.csv2("C:/Sim_Omega3_results/sim_omega3_1_400.txt");
S2 <- read.csv2("C:/Sim_Omega3_results/sim_omega3_401_800.txt");
S3 <- read.csv2("C:/Sim_Omega3_results/sim_omega3_801_1200.txt");
S4 <- read.csv2("C:/Sim_Omega3_results/sim_omega3_1201_1600.txt");
S5 <- read.csv2("C:/Sim_Omega3_results/sim_omega3_1601_2000.txt");
S6 <- read.csv2("C:/Sim_Omega3_results/sim_omega3_2001_2400.txt");
S7 <- read.csv2("C:/Sim_Omega3_results/sim_omega3_2401_2800.txt");
S8 <- read.csv2("C:/Sim_Omega3_results/sim_omega3_2801_3200.txt");
S9 <- read.csv2("C:/Sim_Omega3_results/sim_omega3_3201_3600.txt");
S10 <- read.csv2("C:/Sim_Omega3_results/sim_omega3_3601_4000.txt");
options(max.print=154.8E10);
combine_result <- rbind(S1,S2,S3,S4,S5,S6,S7,S8,S9,S10)
write.table(combine_result,file="C:/sim_omega3_1_4000.txt",sep=";",
row.names=FALSE,col.names=TRUE, quote = FALSE);
任何人都可以帮助我
谢谢,
思鲁提。
答案 0 :(得分:6)
我建议将这些建议纳入?read.csv2
:
内存使用情况:
These functions can use a surprising amount of memory when reading large files. There is extensive discussion in the ‘R Data Import/Export’ manual, supplementing the notes here. Less memory will be used if ‘colClasses’ is specified as one of the six atomic vector classes. This can be particularly so when reading a column that takes many distinct numeric values, as storing each distinct value as a character string can take up to 14 times as much memory as storing it as an integer. Using ‘nrows’, even as a mild over-estimate, will help memory usage. Using ‘comment.char = ""’ will be appreciably faster than the ‘read.table’ default. ‘read.table’ is not the right tool for reading large matrices, especially those with many columns: it is designed to read _data frames_ which may have columns of very different classes. Use ‘scan’ instead for matrices.
答案 1 :(得分:3)
内存分配需要连续的块。磁盘上文件占用的大小可能不是加载到R时对象大小的良好索引。您可以使用以下函数查看其中一个S文件:
?object.size
这是一个我用来看R占用最多空间的函数:
getsizes <- function() {z <- sapply(ls(envir=globalenv()),
function(x) object.size(get(x)))
(tmp <- as.matrix(rev(sort(z))[1:10]))}
答案 2 :(得分:1)
如果您在计算combine_result后remove(S1,S2,S3,S4,S5,S6,S7,S8,S9,S10)
然后gc()
,则可能会释放足够的内存。我还发现,如果你在Windows上,通过RScript运行它似乎允许访问比通过GUI更多的内存。
答案 3 :(得分:1)
如果这些文件是标准格式并且您想在R中执行此操作,那么为什么还要打扰读/写csv。使用readLines
/ writeLines
:
files_in <- file.path("C:/Sim_Omega3_results",c(
"sim_omega3_1_400.txt",
"sim_omega3_401_800.txt",
"sim_omega3_801_1200.txt",
"sim_omega3_1201_1600.txt",
"sim_omega3_1601_2000.txt",
"sim_omega3_2001_2400.txt",
"sim_omega3_2401_2800.txt",
"sim_omega3_2801_3200.txt",
"sim_omega3_3201_3600.txt",
"sim_omega3_3601_4000.txt"))
file.copy(files_in[1], out_file_name <- "C:/sim_omega3_1_4000.txt")
file_out <- file(out_file_name, "at")
for (file_in in files_in[-1]) {
x <- readLines(file_in)
writeLines(x[-1], file_out)
}
close(file_out)