How to check if stopCluster (R) worked

时间:2016-07-11 23:18:39

标签: r parallel-processing

When I try to remove a cluster from my workspace with stopCluster, it does not seem to work. Below is the code I am using.

> cl <- makeCluster(3)
> cl
socket cluster with 3 nodes on host ‘localhost’
> stopCluster(cl)
> cl
socket cluster with 3 nodes on host ‘localhost’

Note that the command cl still is called a socket cluster with 3 nodes after I have supposedly removed it. Shouldn't I get an error that object cl is not found? How do I know that my cluster has actually been removed? A related question: if I close R, is the cluster terminated and my computer returned to its normal state of being able to use all of its cores?

2 个答案:

答案 0 :(得分:6)

在运行cl之前,不应该收到rm(cl)未找到的错误。停止群集不会从您的环境中删除该对象。

使用showConnections查看没有连接处于活动状态:

> require(parallel)
Loading required package: parallel
> cl <- makeCluster(3)
> cl
socket cluster with 3 nodes on host ‘localhost’
> showConnections()
  description         class      mode  text     isopen   can read can write
3 "<-localhost:11129" "sockconn" "a+b" "binary" "opened" "yes"    "yes"    
4 "<-localhost:11129" "sockconn" "a+b" "binary" "opened" "yes"    "yes"    
5 "<-localhost:11129" "sockconn" "a+b" "binary" "opened" "yes"    "yes"    
> stopCluster(cl)
> showConnections()
     description class mode text isopen can read can write
> 

您的计算机是否“恢复正常状态”取决于您创建的群集类型。如果它只是一个简单的套接字或分支集群,那么优雅地停止父进程应该导致所有子进程终止。如果它是一个更复杂的集群,那么终止R可能不会停止它在节点上启动的所有作业。

答案 1 :(得分:1)

不幸的是,print.SOCKcluster方法并不能告诉您群集对象是否可用。但是,您可以通过打印群集对象的元素来查明它是否可用,从而使用print.SOCKnode方法。例如:

> library(parallel)
> cl <- makeCluster(3)
> for (node in cl) try(print(node))
node of a socket cluster on host ‘localhost’ with pid 29607
node of a socket cluster on host ‘localhost’ with pid 29615
node of a socket cluster on host ‘localhost’ with pid 29623
> stopCluster(cl)
> for (node in cl) try(print(node))
Error in summary.connection(connection) : invalid connection
Error in summary.connection(connection) : invalid connection
Error in summary.connection(connection) : invalid connection

请注意,print.SOCKnode实际上是通过套接字连接发送消息,以获取相应工作者的进程ID,如源代码所示:

> parallel:::print.SOCKnode
function (x, ...) 
{
    sendCall(x, eval, list(quote(Sys.getpid())))
    pid <- recvResult(x)
    msg <- gettextf("node of a socket cluster on host %s with pid %d", 
        sQuote(x[["host"]]), pid)
    cat(msg, "\n", sep = "")
    invisible(x)
}
<bytecode: 0x2f0efc8>
<environment: namespace:parallel>

因此,如果您在群集对象上调用stopCluster,则在尝试使用套接字连接时会出错。