我正在尝试这里提供的示例: https://github.com/ischlag/distributed-tensorflow-example 我有两台机器:一台作为服务器,另一台作为工作。 (两台机器上的版本都是1.0.1)
我收到以下错误:
初始化的变量...... 我的tensorflow / core / distributed_runtime / master.cc:193] CreateSession仍在等待来自worker的响应:/ job:ps / replica:0 / task:0 I tensorflow / core / distributed_runtime / master.cc:193] CreateSession仍在等待来自worker的响应:/ job:worker / replica:0 / task:1 我的tensorflow / core / distributed_runtime / master.cc:193] CreateSession仍在等待来自worker的响应:/ job:worker / replica:0 / task:2
答案 0 :(得分:0)
I had a similar issue that I was able to fix by adding a third node as a master to the ClusterSpec
. My TF_CONFIG
environment variable looks something like:
TF_CONFIG = {
'cluster' : {
'master' : [ master_node01:2222 ],
'ps' : [ps_node01:2222, ...]
'worker' : [worker_node01:2222, ...]}
'environment' : 'cloud',
'task': {'type': current_task, 'index': current_index}}
答案 1 :(得分:0)
我遇到了同样的问题,经过几个小时的调试后,我发现问题是因为cluster_spec的顺序不正确。 task_index与ps / worker列表不匹配。我改变了订单后,它被修复了。