更新

Question

我想使用tensorflow训练网络，我选择＆＃34; Inception_resnet_v2＆＃34;作为网（来自here），这是我的火车代码，

def train(train_dir, annotations, max_step, checkpoint_dir='./checkpoint2/'):
# train the model
features = tf.placeholder("float32", shape=[None, IMAGE_SIZE, IMAGE_SIZE, IMAGE_CHANNEL], name="features")
labels = tf.placeholder("float32", [None], name="labels")
one_hot_labels = tf.one_hot(indices=tf.cast(labels, tf.int32), depth=80)
keep_prob = tf.placeholder("float32")
isTraining = tf.placeholder("bool")
#train_step, cross_entropy, logits, keep_prob = network.inference(features, one_hot_labels)
logits, _=inception_resnet_v2.inception_resnet_v2(features,80,isTraining,keep_prob)
# calculate loss
cross_entropy = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(labels=one_hot_labels, logits=logits))



train_step = tf.train.AdamOptimizer(LEARNINGRATE).minimize(cross_entropy)



correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(one_hot_labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))


image_list, label_list = scene_input2.get_files(train_dir, annotations)
image_batch, label_batch = scene_input2.get_batch(image_list, label_list, IMAGE_SIZE, IMAGE_SIZE, BATCH_SIZE)

with tf.Session() as sess:
    saver = tf.train.Saver()
    ckpt = tf.train.get_checkpoint_state(checkpoint_dir)
    if ckpt and ckpt.model_checkpoint_path:
        print('Restore the model from checkpoint %s' % ckpt.model_checkpoint_path)
        # Restores from checkpoint
        saver.restore(sess, ckpt.model_checkpoint_path)
        start_step = int(ckpt.model_checkpoint_path.split('/')[-1].split('-')[-1])
    else:
        sess.run(tf.global_variables_initializer())
        start_step = 0
        print('start training from new state')
    logger = scene_input.train_log(LOGNAME)

    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(coord=coord)

    try:
        #        Check if stop was requested.
        step=start_step
        while not coord.should_stop() and step<start_step + max_step:
            start_time = time.time()
            x, y = sess.run([image_batch, label_batch])
            #y = tf.one_hot(indices=tf.cast(y, tf.int32), depth=80)
            #y = sess.run(y)
            sess.run(train_step, feed_dict={features: x, labels: y, isTraining: True, keep_prob: 0.5})
            if step % 50 == 0:
                train_accuracy = sess.run(accuracy, feed_dict={features: x, labels: y, isTraining: False, keep_prob: 1})
                train_loss = sess.run(cross_entropy, feed_dict={features: x, labels: y, isTraining:False, keep_prob: 1})
                duration = time.time() - start_time
                logger.info("step %d: training accuracy %g, loss is %g (%0.3f sec)" % (step, train_accuracy, train_loss, duration))
            if step % 1000 == 1:
                saver.save(sess, CHECKFILE, global_step=step)
                print('writing checkpoint at step %s' % step)
            step=step+1


    except tf.errors.OutOfRangeError:
        print('done!')
    finally:
        #Request that the threads stop.After this is called, calls to should_stop() will return True.
        coord.request_stop()
    coord.join(threads)

但是当我训练网时，我遇到了一个错误：

    Traceback (most recent call last):
  File "scene2.py", line 245, in <module>
    train(FLAGS.train_dir, FLAGS. annotations, FLAGS.max_step)
  File "scene2.py", line 82, in train
    logits, _=inception_resnet_v2.inception_resnet_v2(features,80,isTraining,keep_prob)
  File "/home/vision/inception_resnet_v2.py", line 357, in inception_resnet_v2
    scope='Dropout')
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 181, in func_with_args
    return func(*args, **current_args)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1216, in dropout
    _scope=sc)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/layers/core.py", line 247, in __init__
    self.rate = min(1., max(0., rate))
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/framework/ops.py", line 564, in __bool__
    raise TypeError("Using a `tf.Tensor` as a Python `bool` is not allowed. "
TypeError: Using a `tf.Tensor` as a Python `bool` is not allowed. Use `if t is not None:` instead of `if t:` to test if a tensor is defined, and use TensorFlow ops such as tf.cond to execute subgraphs conditioned on the value of a tensor.
vision@Hjl:~/$ CUDA_VISIBLE_DEVICES=0 python3 scene2.py --mode train
Traceback (most recent call last):
  File "scene2.py", line 245, in <module>
    train(FLAGS.train_dir, FLAGS. annotations, FLAGS.max_step)
  File "scene2.py", line 82, in train
    logits, _=inception_resnet_v2.inception_resnet_v2(features,80,isTraining,keep_prob)
  File "/home/vision/inception_resnet_v2.py", line 357, in inception_resnet_v2
    scope='Dropout')
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 181, in func_with_args
    return func(*args, **current_args)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1216, in dropout
    _scope=sc)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/layers/core.py", line 247, in __init__
    self.rate = min(1., max(0., rate))
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/framework/ops.py", line 564, in __bool__
    raise TypeError("Using a `tf.Tensor` as a Python `bool` is not allowed. "
TypeError: Using a `tf.Tensor` as a Python `bool` is not allowed. Use `if t is not None:` instead of `if t:` to test if a tensor is defined, and use TensorFlow ops such as tf.cond to execute subgraphs conditioned on the value of a tensor.

当我将keep_prob和keep_prob传递给inception_resnet_v2.inception_resnet_v2（features，80，isTraining，keep_prob）时，发生了错误。我该如何解决这个问题？

当我训练网络时，我想设置keep_prob = 0.5，isTraining = True，但同时，每50步，我想看模型的train_accuracy和train_loss，所以我应该设置keep_prob = 1.0，isTraining = False，我是对的吗？我该如何实施呢？

Answer 1

如果你的最终目标是同时进行培训和评估，并且你正在使用tf-slim库提供的神经网络实现，那么最简单的方法就是the methodology prescribed by tf-slim co-author Nathan Silberman。

简而言之，培训和评估由两个独立的过程执行，评估过程指向一个检查点目录，等待（无限）通过培训过程将新检查点写入该目录，然后自动执行评估在新写入的检查点上，并将摘要写入指定的eval输出目录。

首先，您应该查看the TensorFlow-Slim image classification model library中提供的train_image_classifier.py和eval_image_classifier.py脚本。

在eval_image_classifier.py中，您需要替换代码：

if tf.gfile.IsDirectory(FLAGS.checkpoint_path):
  checkpoint_path = tf.train.latest_checkpoint(FLAGS.checkpoint_path)
else:
  checkpoint_path = FLAGS.checkpoint_path

tf.logging.info('Evaluating %s' % checkpoint_path)

slim.evaluation.evaluate_once(
    master=FLAGS.master,
    checkpoint_path=checkpoint_path,
    logdir=FLAGS.eval_dir,
    num_evals=num_batches,
    eval_op=list(names_to_updates.values()),
    variables_to_restore=variables_to_restore)

代码：

tf.logging.info('Evaluating %s' % FLAGS.checkpoint_path)

slim.evaluation.evaluation_loop(
    master=FLAGS.master,
    checkpoint_dir=FLAGS.checkpoint_path,
    logdir=FLAGS.eval_dir,
    num_evals=num_batches,
    eval_op=list(names_to_updates.values()),
    variables_to_restore=variables_to_restore)

如果您希望两个进程都使用您的GPU而不会遇到OOM错误，您可以通过创建ConfigProto对象并将其作为session_config的参数传递给每个进程分配一小部分GPU内存参数slim.learning.train()或slim.evaluation.evaluation_loop()。请参阅＆＃34;允许GPU内存增长＆＃34; this tensorflow.org article部分供参考。

关于is_training的参数化，您会注意到train和eval脚本分别通过True和False作为is_training nets_factory.get_netowrk_fn()参数的参数。 }}

关于keep_prob的参数化，nets_factory不公开细网的dropout_keep_prob参数。相反，slim.dropout()接受is_training作为参数，并用身份函数替换构成dropout的计算。换句话说，tf-slim非常棒，它可以自动地禁用＆＃34;将is_training=False传递给nets_factory.get_netowrk_fn()时的丢失，就像eval_image_classifier.py中的情况一样。

如果您希望直接将dropout_keep_prob公开给train_image_classifier.py（例如，用于超参数调整目的），您将不得不摆弄nets_factory.get_network_fn()的实现。

Answer 2

如果您使用this方法，则预计python let payload = { client_id: 'CLIENT_ID_HERE', scope: 'https://graph.microsoft.com/.default', client_secret: 'CLIENT_SECRET_HERE', grant_type: 'client_credentials' }; $http.post('https://login.microsoftonline.com/' + tenant + '/oauth2/v2.0/token', payload);和boolean值不会float。所以你需要传递像

这样的值

tensor

而不是

keep_prob = 0.5
isTraining = True

更新

但是如果您需要在培训时间提供它们，我认为最简单的方法是在此line编辑keep_prob = tf.placeholder("float32") isTraining = tf.placeholder("bool")方法参数，如下所示（删除默认参数值），

inception_resnet_v2

然后您就可以通过def inception_resnet_v2(inputs, num_classes=1001, is_training, dropout_keep_prob, reuse=None, scope='InceptionResnetV2', create_aux_logits=True, activation_fn=tf.nn.relu):和keep_prob了。希望它有所帮助

如何在tensorflow中传递参数

2 个答案:

更新