TensorFlow:苗条训练循环崩溃“没有会话工厂注册”

时间:2017-01-23 14:50:39

标签: tensorflow tf-slim

我有一个工作的TF安装和苗条工作也很好。

然而,当我尝试运行一个苗条的训练循环时,我的应用程序崩溃了。

最小代码:

import tensorflow as tf
import tensorflow.contrib.slim as slim


# Load data.
...


graph = tf.Graph()
with graph.as_default():

    # Build model
    ...

    # Add losses
    ...

    # Create training operation and start the actual training loop.
    train_op = ...

    # Start training loop

    slim.learning.train(
        train_op,
        logdir=FLAGS.logdir,
        save_summaries_secs=FLAGS.save_summaries_secs,
        save_interval_secs=FLAGS.save_interval_secs,
        master=FLAGS.master,
        is_chief=(FLAGS.task == 0),
        startup_delay_steps=(FLAGS.task * 20),
        log_every_n_steps=FLAGS.log_every_n_steps)

当我跑步时,我得到:

E tensorflow/core/common_runtime/session.cc:69] Not found: No session factory registered for the given session options: {target: "local" config: } Registered factories are {DIRECT_SESSION, GRPC_SESSION}.
Traceback (most recent call last):
File "tensorflow/tensorflow/contrib/my_package/python/my_package/train.py", line 467, in <module>
    app.run()
File "$HOME/.local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 44, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "tensorflow/tensorflow/contrib/my_package/python/my_package/train.py", line 462, in main
    log_every_n_steps=FLAGS.log_every_n_steps)
File "$HOME/.local/lib/python2.7/site-packages/tensorflow/contrib/slim/python/slim/learning.py", line 776, in train
    master, start_standard_services=False, config=session_config) as sess:
File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
File "$HOME/.local/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 973, in managed_session
    self.stop(close_summary_writer=close_summary_writer)
File "$HOME/.local/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 801, in stop
    stop_grace_period_secs=self._stop_grace_secs)
File "$HOME/.local/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py", line 386, in join
    six.reraise(*self._exc_info_to_raise)
File "$HOME/.local/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 962, in managed_session
    start_standard_services=start_standard_services)
File "$HOME/.local/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 719, in prepare_or_wait_for_session
    init_feed_dict=self._init_feed_dict, init_fn=self._init_fn)
File "$HOME/.local/lib/python2.7/site-packages/tensorflow/python/training/session_manager.py", line 256, in prepare_session
    config=config)
File "$HOME/.local/lib/python2.7/site-packages/tensorflow/python/training/session_manager.py", line 161, in _restore_checkpoint
    sess = session.Session(self._target, graph=self._graph, config=config)
File "$HOME/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1187, in __init__
    super(Session, self).__init__(target, graph, config=config)
File "$HOME/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 552, in __init__
    self._session = tf_session.TF_NewDeprecatedSession(opts, status)
File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
File "$HOME/.local/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 469, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.NotFoundError: No session factory registered for the given session options: {target: "local" config: } Registered factories are {DIRECT_SESSION, GRPC_SESSION}.

相反,同一模型将在“{手动”调用train_op时进行训练:

with tf.Session(graph=graph) as sess:
    tf.global_variables_initializer().run()

    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(sess=sess, coord=coord)

    for step in xrange(FLAGS.max_steps):
    _, summaries = sess.run([train_op, summary_op])
    ...
    coord.request_stop()
    coord.join(threads)

有没有人知道从哪里开始调试?

谢谢你, 菲利普

1 个答案:

答案 0 :(得分:1)

看起来这一行导致了问题:

    master=FLAGS.master,

从错误消息中可以看出,Slim正在尝试将会话创建为sess = tf.Session("local"),这不是有效的会话目标。尝试在运行脚本时传递标记--master="",或在调用master=""时显式设置slim.learning.train()