TensorFlow中的LookUpError与tf.cond()

时间:2017-09-08 11:00:52

标签: tensorflow tensorflow-gpu

工作环境

  • TensorFlow发布版本:1.3.0-rc2
  • TensorFlow git版本:v1.3.0-rc1-994-gb93fd37
  • 操作系统:CentOS Linux版本7.2.1511(核心)

问题说明

我在处理时使用tf.cond()在训练和验证数据集之间移动。以下代码段显示了我的表现:

with tf.variable_scope(tf.get_variable_scope()) as vscope:
        for i in range(4):
            with tf.device('/gpu:%d'%i):
                with tf.name_scope('GPU-Tower-%d'%i) as scope:
                    worktype = tf.get_variable("wt",[], initializer=tf.zeros_initializer())
                    worktype = tf.assign(worktype, 1)
                    workcondition = tf.equal(worktype, 1)
                    elem = tf.cond(workcondition, lambda: train_iterator.get_next(), lambda: val_iterato\
r.get_next())
                    net =  vgg16cnn2(elem[0],numclasses=256)
                    img = elem[0]
                    centropy  = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels=ele\
m[1],logits= net))
                    reg_losses = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES, scope)
                    regloss = 0.05 * tf.reduce_sum(reg_losses)
                    total_loss = centropy + regloss
                    t1 = tf.summary.scalar("Training Batch Loss", total_loss)
                    tf.get_variable_scope().reuse_variables()
                    predictions = tf.cast(tf.argmax(tf.nn.softmax(net), 1), tf.int32)
                    correct_predictions = tf.cast(tf.equal(predictions, elem[1]), tf.float32)
                    batch_accuracy = tf.reduce_mean(correct_predictions)
                    t2 = tf.summary.scalar("Training Batch Accuracy", batch_accuracy)
                    correct_detection.append(correct_predictions)
                    grads = optim.compute_gradients(total_loss)

因此,基本上基于worktype的值,将从培训或验证集中获取小批量。

当我运行此代码时,我得到以下LookUp Error

LookupError: No gradient defined for operation 'GPU-Tower-0/cond/IteratorGetNext_1' (op type: IteratorGetNext)

为什么TensorFlow认为IteratorGetNext_1需要渐变?我该如何解决这个问题?

1 个答案:

答案 0 :(得分:1)

变量worktype被标记为可训练。默认情况下,Optimizer.compute_gradients(...)计算所有可训练变量的渐变。

有两种方法可以解决这个问题:

  1. tf.get_variable(...)中设置trainable=False
  2. 使用Optimizer.compute_gradients(...)var_list参数明确指定应使用其计算渐变的变量。