我正在训练一个模型,并从单个GPU转换为使用多个GPU。 单个GPU运行训练代码:
with tf.session(graph= model.graph, config = sess_config) as sess:
print("model initialized")
多GPU运行训练代码:
sv = tf.train.Supervisor(is_chief =(FLAGS.task_index == 0),
logdir=log_file,
global_step = model.global_step,
init_op=init_op)
with sv.managed_session(server.target, config= sess_config) as sess:
print("model initialized")
多GPU运行训练代码不会在会话中初始化图形,也不会在sv.managed_session或会话的其他功能中初始化图形。
报告错误:
Trackback (most recent call last):
File "./train/main.py", line 22 in <module>
with sv.managed_session(server.target, config=sess_config) as sess:
File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
return self.gen.next()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/supervisor.py" line 964, in managed_session
self.stop(close_summary_writer=close_summary_writer)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/supervisor.py", line 792, in stop
stop_grace_period_secs=self._stop_grace_secs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/coordinator.py", line 389, in join
six.reraise(*self._exc_info_to_raise)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/supervisor.py", line 953, in managed_session
start_standard_services=start_standard_services)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/supervisor.py", line 709, in prepare_or_wait_for_session
self._write_graph()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/supervisor.py", line 612, in _write_graph
self._logdir, "graph.pbtxt")
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/graph_io.py", line 69, in write_graph
text_format.MessageToString(graph_def))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/lib/io/file_io.py", line 421, in atomic_write_string_to_file
write_string_to_file(temp_pathname, contents)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/lib/io/file_io.py", line 306, in write_string_to_file
f.write(file_content)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/lib/io/file_io.py", line 102, in write
self._prewrite_check()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/lib/io/file_io.py", line 88, in _prewrite_check
compat.as_bytes(self.__name), compat.as_bytes(self.__mode), status)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.FailedPreconditionError: /data/log/20181211-FMNN/graph.pbtxt.tmpdcf8121c37904e07adf8b1a0448635eb; Not a directory