tensorflow.python.framework.errors_impl.FailedPreconditionError:/data/graph.pbtxt.tmpdcf8121c37904e07adf8b1a0448635eb;不是目录

时间:2019-02-25 11:53:21

标签: tensorflow distributed

我正在训练一个模型,并从单个GPU转换为使用多个GPU。 单个GPU运行训练代码:

with tf.session(graph= model.graph, config = sess_config) as sess:
    print("model initialized")

多GPU运行训练代码:

sv = tf.train.Supervisor(is_chief =(FLAGS.task_index == 0),
                         logdir=log_file,
                         global_step = model.global_step,
                         init_op=init_op)
with sv.managed_session(server.target, config= sess_config) as sess:
    print("model initialized")

多GPU运行训练代码不会在会话中初始化图形,也不会在sv.managed_session或会话的其他功能中初始化图形。

报告错误:

Trackback (most recent call last):
File "./train/main.py", line 22 in <module>
  with sv.managed_session(server.target, config=sess_config) as sess:
File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
  return self.gen.next()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/supervisor.py" line 964, in managed_session
  self.stop(close_summary_writer=close_summary_writer)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/supervisor.py", line 792, in stop
  stop_grace_period_secs=self._stop_grace_secs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/coordinator.py", line 389, in join
  six.reraise(*self._exc_info_to_raise)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/supervisor.py", line 953, in managed_session
  start_standard_services=start_standard_services)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/supervisor.py", line 709, in prepare_or_wait_for_session
  self._write_graph()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/supervisor.py", line 612, in _write_graph
  self._logdir, "graph.pbtxt")
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/graph_io.py", line 69, in write_graph
  text_format.MessageToString(graph_def))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/lib/io/file_io.py", line 421, in atomic_write_string_to_file
  write_string_to_file(temp_pathname, contents)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/lib/io/file_io.py", line 306, in write_string_to_file
  f.write(file_content)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/lib/io/file_io.py", line 102, in write
  self._prewrite_check()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/lib/io/file_io.py", line 88, in _prewrite_check
  compat.as_bytes(self.__name), compat.as_bytes(self.__mode), status)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__
  c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.FailedPreconditionError: /data/log/20181211-FMNN/graph.pbtxt.tmpdcf8121c37904e07adf8b1a0448635eb; Not a directory

0 个答案:

没有答案