如何在R中使用可用内存

时间:2019-01-25 00:43:49

标签: r parallel-processing doparallel

我在64位Windows 10上运行64位R / RStudio。PC具有16GB RAM,并在8核上运行。

因此,RStudio在读取更大的数据集时会以大约1.6 / 7 GB的内存利用率崩溃。

因此,我正在尝试使用并行包来执行具有多个内核的操作。但是我在某个地方犯错了。

library("data.table")
library("lubridate")
library("parallel")
library("foreach")
library("doParallel")

cl <- makeCluster(detectCores() - 1)
registerDoParallel(cl, cores = detectCores() - 2)

files = list.files(pattern="public")
myfiles = do.call(rbind, lapply(files, function(x) fread(x, colClasses=c(ID="character")))) 

我对并行处理没有太多经验。

您能告诉我我在哪里弄错了吗?

更新:

R在内存中创建8gb对象没有问题。

bigint <- integer(2^32 / 2)

仍然不确定是什么限制了数据的读取。

更新2:

我做了诊断报告。这些是我得到的错误。

24 Jan 2019 23:10:33 [rdesktop] ERROR system error 231 (All pipe instances are busy); OCCURRED AT: virtual void rstudio::core::http::NamedPipeAsyncClient::connectAndWriteRequest() C:/Users/Administrator/rstudio/src/cpp/core/include/core/http/NamedPipeAsyncClient.hpp:84; LOGGED FROM: void rstudio::desktop::NetworkReply::onError(const rstudio::core::Error&) C:\Users\Administrator\rstudio\src\cpp\desktop\DesktopNetworkReply.cpp:288
24 Jan 2019 23:11:47 [rdesktop] ERROR system error 231 (All pipe instances are busy); OCCURRED AT: virtual void rstudio::core::http::NamedPipeAsyncClient::connectAndWriteRequest() C:/Users/Administrator/rstudio/src/cpp/core/include/core/http/NamedPipeAsyncClient.hpp:84; LOGGED FROM: void rstudio::desktop::NetworkReply::onError(const rstudio::core::Error&) C:\Users\Administrator\rstudio\src\cpp\desktop\DesktopNetworkReply.cpp:288
24 Jan 2019 23:11:47 [rdesktop] ERROR system error 231 (All pipe instances are busy); OCCURRED AT: virtual void rstudio::core::http::NamedPipeAsyncClient::connectAndWriteRequest() C:/Users/Administrator/rstudio/src/cpp/core/include/core/http/NamedPipeAsyncClient.hpp:84; LOGGED FROM: void rstudio::desktop::NetworkReply::onError(const rstudio::core::Error&) C:\Users\Administrator\rstudio\src\cpp\desktop\DesktopNetworkReply.cpp:288
24 Jan 2019 23:13:39 [rdesktop] ERROR system error 231 (All pipe instances are busy); OCCURRED AT: virtual void rstudio::core::http::NamedPipeAsyncClient::connectAndWriteRequest() C:/Users/Administrator/rstudio/src/cpp/core/include/core/http/NamedPipeAsyncClient.hpp:84; LOGGED FROM: void rstudio::desktop::NetworkReply::onError(const rstudio::core::Error&) C:\Users\Administrator\rstudio\src\cpp\desktop\DesktopNetworkReply.cpp:288
24 Jan 2019 23:13:40 [rdesktop] ERROR system error 231 (All pipe instances are busy); OCCURRED AT: virtual void rstudio::core::http::NamedPipeAsyncClient::connectAndWriteRequest() C:/Users/Administrator/rstudio/src/cpp/core/include/core/http/NamedPipeAsyncClient.hpp:84; LOGGED FROM: void rstudio::desktop::NetworkReply::onError(const rstudio::core::Error&) C:\Users\Administrator\rstudio\src\cpp\desktop\DesktopNetworkReply.cpp:288
24 Jan 2019 23:13:41 [rdesktop] ERROR system error 232 (The pipe is being closed); OCCURRED AT: void rstudio::core::http::AsyncClient<SocketService>::handleWrite(const rstudio_boost::system::error_code&) [with SocketService = rstudio_boost::asio::windows::basic_stream_handle<>] C:/Users/Administrator/rstudio/src/cpp/core/include/core/http/AsyncClient.hpp:342; LOGGED FROM: void rstudio::desktop::NetworkReply::onError(const rstudio::core::Error&) C:\Users\Administrator\rstudio\src\cpp\desktop\DesktopNetworkReply.cpp:288
24 Jan 2019 23:13:42 [rdesktop] ERROR system error 2 (The system cannot find the file specified); OCCURRED AT: virtual void rstudio::core::http::NamedPipeAsyncClient::connectAndWriteRequest() C:/Users/Administrator/rstudio/src/cpp/core/include/core/http/NamedPipeAsyncClient.hpp:84; LOGGED FROM: void rstudio::desktop::NetworkReply::onError(const rstudio::core::Error&) C:\Users\Administrator\rstudio\src\cpp\desktop\DesktopNetworkReply.cpp:288
24 Jan 2019 23:13:42 [rdesktop] ERROR system error 2 (The system cannot find the file specified); OCCURRED AT: virtual void rstudio::core::http::NamedPipeAsyncClient::connectAndWriteRequest() C:/Users/Administrator/rstudio/src/cpp/core/include/core/http/NamedPipeAsyncClient.hpp:84; LOGGED FROM: void rstudio::desktop::NetworkReply::onError(const rstudio::core::Error&) C:\Users\Administrator\rstudio\src\cpp\desktop\DesktopNetworkReply.cpp:288

1 个答案:

答案 0 :(得分:0)

如果非并行进程使用R GB的RAM,那么具有C核的并行进程将需要大约R * C GB的RAM。我建议逐渐增加,从2个内核开始,并监视您的RAM使用情况。