Question

我正在编写一个处理几个非常大的data.tables的函数，我希望在Windows机器上并行化这个函数。

我可以使用snow包使用clusterExport为集群中的每个节点创建每个data.tables的副本。但是这不起作用，因为它使用了太多内存。

我想通过将data.tables的不同子集导出到每个节点来解决这个问题，但是，我无法在snow包中看到如何执行此操作。

这是一个有效的代码的玩具示例，但内存效率低下：

library(snow)
dd <- data.frame(a = rep(1:5, each = 2), b = 11:20)
cl <- makeCluster(2, type = "SOCK")
clusterExport(cl = cl, "dd")
clusterApply(cl, x = c(2,7),  function(thresh) colMeans(dd[dd$a < thresh,]))
stopCluster(cl)

以下是一个代码示例，该代码不起作用，但解释了我们如何将dd的子集分发到节点：

library(snow)
dd <- data.frame(a = rep(1:5, each = 2), b = 11:20)
cl <- makeCluster(2, type = "SOCK")

dd_exports <- lapply(c(2,7), function(thresh) dd[dd$a < thresh])
#Now we export the ith element of dd_exports to the ith node:
clusterExport(cl = cl, dd_exports) 
clusterApply(cl, x = c(2,7),  function(x) colMeans(dd))
stopCluster(cl)

将不同的data.tables子集导出到集群中的每个节点

0 个答案: