发生了这个严重的错误。我一直在四处搜寻,但没有任何线索。请帮忙。
在关于分布式训练的tensorflow文档之后,我指定ClusterSpec并创建分布式训练。但是,大约18小时后,以下回溯提示。我该如何解决?
Traceback (most recent call last):
File "train.py", line 146, in <module>
tf.app.run()
File "/data/home/tf3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "train.py", line 138, in main
save_summaries_steps=10) as sess: File "/data/home/tf3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 383, in MonitoredTrainingSession stop_grace_period_secs=stop_grace_period_secs) File "/data/home/tf3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 832, in __init__
stop_grace_period_secs=stop_grace_period_secs)
File "/data/home/tf3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 555, in __init__ self._sess = _RecoverableSession(self._coordinated_creator) File "/data/home/tf3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1018, in __init__ _WrappedSession.__init__(self, self._create_session())
File "/data/home/tf3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1023, in _create_session
return self._sess_creator.create_session()
File "/data/home/tf3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 712, in create_session
self.tf_sess = self._session_creator.create_session()
File "/data/home/tf3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 525, in create_session
max_wait_secs=self._max_wait_secs
File "/data/home/tf3/lib/python3.6/site-packages/tensorflow/python/training/session_manager.py", line 423, in wait_for_session
"Session was not ready after waiting %d secs." % (max_wait_secs,))
tensorflow.python.framework.errors_impl.DeadlineExceededError: Session was not ready after waiting 7200 secs.