我正在运行分布式Tensorflow脚本。创建群集服务器时,我看到控制台中显示的信息如下所示:
E0805 20:51:03.294260965 3387 ev_epoll1_linux.c:1051] grpc epoll fd:3
2017-08-05 20:51:03.299766:I tensorflow / core / distributed_runtime / rpc / grpc_channel.cc:215]初始化GrpcChannelCache for job ps - > {0 - >本地主机:2222}
2017-08-05 20:51:03.299790:I tensorflow / core / distributed_runtime / rpc / grpc_channel.cc:215]为求职者初始化GrpcChannelCache - > {0 - >本地主机:2223}
2017-08-05 20:51:03.305220:I tensorflow / core / distributed_runtime / rpc / grpc_server_lib.cc:316]启动服务器的目标:grpc:// localhost:2223
培训时,我遇到相同的信息而没有其他回复。
E0805 20:52:45.889979901 3387 ev_epoll1_linux.c:1051] grpc epoll fd:3
信息从with tf.Session("grpc://localhost:2223") as sess:
Tensorflow的版本:1.3.0-rc0
,用bazel编译并适用于单机
Linux的版本:Distributor ID: Ubuntu
Description: Ubuntu 14.04.5 LTS
Release: 14.04
Codename: trusty
Active Internet connect是:
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:2222 0.0.0.0:* LISTEN 8321/python
tcp 0 0 0.0.0.0:2223 0.0.0.0:* LISTEN 8883/python
以下是创建群集服务器的示例代码
def main(_):
server = tf.train.Server(cluster,
job_name=FLAGS.job_name,
task_index=FLAGS.task_index)
server.join()
if __name__ == "__main__":
tf.app.run()
和培训代码
train_X = np.random.rand(100).astype(np.float32)
train_Y = train_X * 0.1 + 0.3
with tf.device("/job:worker/task:0"):
X = tf.placeholder(tf.float32)
Y = tf.placeholder(tf.float32)
w = tf.Variable(0.0)
b = tf.Variable(0.0)
y = w * X + b
loss = tf.reduce_mean(tf.square(y - Y))
init_op = tf.global_variables_initializer()
train_op = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
with tf.Session("grpc://localhost:2223") as sess:
sess.run(init_op)
for i in range(500):
sess.run(train_op, feed_dict={X: train_Y, Y: train_Y})
print("after sess.run train")
if i % 50 == 0:
print i, sess.run(w), sess.run(b)
print sess.run(w)
print sess.run(b)
有谁知道如何修复它?感谢。