GRPC不适用于分布式Tensorflow

时间:2017-08-05 14:26:19

标签: tensorflow grpc

我正在运行分布式Tensorflow脚本。创建群集服务器时,我看到控制台中显示的信息如下所示:

  

E0805 20:51:03.294260965 3387 ev_epoll1_linux.c:1051] grpc epoll fd:3

     

2017-08-05 20:51:03.299766:I tensorflow / core / distributed_runtime / rpc / grpc_channel.cc:215]初始化GrpcChannelCache for job ps - > {0 - >本地主机:2222}

     

2017-08-05 20:51:03.299790:I tensorflow / core / distributed_runtime / rpc / grpc_channel.cc:215]为求职者初始化GrpcChannelCache - > {0 - >本地主机:2223}

     

2017-08-05 20:51:03.305220:I tensorflow / core / distributed_runtime / rpc / grpc_server_lib.cc:316]启动服务器的目标:grpc:// localhost:2223

培训时,我遇到相同的信息而没有其他回复。

  

E0805 20:52:45.889979901 3387 ev_epoll1_linux.c:1051] grpc epoll fd:3

信息从with tf.Session("grpc://localhost:2223") as sess:

打印

Tensorflow的版本:1.3.0-rc0,用bazel编译并适用于单机

Linux的版本:Distributor ID: Ubuntu Description: Ubuntu 14.04.5 LTS Release: 14.04 Codename: trusty

Active Internet connect是:

Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:2222            0.0.0.0:*               LISTEN      8321/python
tcp        0      0 0.0.0.0:2223            0.0.0.0:*               LISTEN      8883/python

以下是创建群集服务器的示例代码

def main(_):
    server = tf.train.Server(cluster,
                         job_name=FLAGS.job_name,
                         task_index=FLAGS.task_index)
    server.join()

if __name__ == "__main__":
    tf.app.run()

和培训代码

train_X = np.random.rand(100).astype(np.float32)
train_Y = train_X * 0.1 + 0.3

with tf.device("/job:worker/task:0"):
    X = tf.placeholder(tf.float32)
    Y = tf.placeholder(tf.float32)
    w = tf.Variable(0.0)
    b = tf.Variable(0.0)
    y = w * X + b
    loss = tf.reduce_mean(tf.square(y - Y))

    init_op = tf.global_variables_initializer()
    train_op = tf.train.GradientDescentOptimizer(0.01).minimize(loss)

with tf.Session("grpc://localhost:2223") as sess:
    sess.run(init_op)
    for i in range(500):
        sess.run(train_op, feed_dict={X: train_Y, Y: train_Y})
        print("after sess.run train")
        if i % 50 == 0:
            print i, sess.run(w), sess.run(b)

print sess.run(w)
print sess.run(b)

有谁知道如何修复它?感谢。

0 个答案:

没有答案