在TensorBoard中存在未计入的条件节点

时间:2017-06-08 11:54:00

标签: tensorflow tensorboard

问题

当我开始训练时,我的预处理示例已成功创建,但我的训练没有开始。更奇怪的是,在分析我的TensorBoard图时,我看到了代码中不存在的一些额外的条件节点。我想知道这些额外节点在哪里以及为什么会出现,以及为什么培训不会开始。以下是对情况的系统描述:

TensorFlow Graph 以下TensorBoard图显示了我的图: enter image description here 构造此图的代码位于

之下
def getconv2drelu(inputtensor, kernelsize, strides, padding, convname,
                  imagesummaries=False):
    weights = tf.get_variable("weights", shape=kernelsize, dtype=tf.float32,
                              initializer=tf.truncated_normal_initializer(0,
                                                                          0.01),
                              regularizer=tf.nn.l2_loss)
    biases = tf.get_variable("biases", shape=kernelsize[3], dtype=tf.float32,
                             initializer=tf.constant_initializer(0.0))

    conv = tf.nn.conv2d(input=inputtensor, filter=weights, strides=strides,
                        padding=padding, name=convname)
    response = tf.nn.bias_add(conv, biases)
    if imagesummaries:
        filters = (weights - tf.reduce_min(weights)) / (tf.reduce_max(
            weights) - tf.reduce_min(weights))
        filters = tf.transpose(filters, [3, 0, 1, 2])
        tf.summary.image(convname + " filters", filters,
                         max_outputs=kernelsize[3])
    response = tf.nn.relu(response)
    activation_summary(response)
    return response


def getfullyconnected(inputtensor, numinput, numoutput):
    weights = tf.get_variable("weights", shape=[numinput, numoutput],
                              dtype=tf.float32,
                              initializer=
                              tf.truncated_normal_initializer(0, 0.01))
    biases = tf.get_variable("biases", shape=[numoutput], dtype=tf.float32,
                             initializer=tf.truncated_normal_initializer(
                                 0, 0.01))
    response = tf.add(tf.matmul(inputtensor, weights), biases)
    response = tf.nn.relu(response)
    activation_summary(response)
    return response


def inference(inputs):
    with tf.variable_scope("layer1"):
        conv = getconv2drelu(inputtensor=inputs, kernelsize=[7, 7, 3, 96],
                             strides=[1, 2, 2, 1], padding="VALID",
                             convname="conv1", imagesummaries=True)
        pool = tf.nn.max_pool(conv, [1,  3, 3, 1], strides=[1, 3, 3, 1],
                              padding="SAME", name="pool1")

    with tf.variable_scope("layer2"):
        conv = getconv2drelu(inputtensor=pool, kernelsize=[7, 7, 96, 256],
                             strides=[1, 1, 1, 1], padding="VALID",
                             convname="conv2", imagesummaries=False)
        pool = tf.nn.max_pool(conv, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1],
                              padding="SAME", name="pool2")

    with tf.variable_scope("layer3"):
        conv = getconv2drelu(inputtensor=pool, kernelsize=[7, 7, 256, 512],
                             strides=[1, 1, 1, 1], padding="SAME",
                             convname="conv3", imagesummaries=False)

    with tf.variable_scope("layer4"):
        conv = getconv2drelu(inputtensor=conv, kernelsize=[3, 3, 512, 512],
                             strides=[1, 1, 1, 1], padding="SAME",
                             convname="conv4", imagesummaries=False)

    with tf.variable_scope("layer5"):
        conv = getconv2drelu(inputtensor=conv, kernelsize=[3, 3, 512, 1024],
                             strides=[1, 1, 1, 1], padding="SAME",
                             convname="conv5", imagesummaries=False)

    with tf.variable_scope("layer6"):
        conv = getconv2drelu(inputtensor=conv, kernelsize=[3, 3, 1024, 1024],
                             strides=[1, 1, 1, 1], padding="SAME",
                             convname="conv6", imagesummaries=False)
        pool = tf.nn.max_pool(conv, [1, 3, 3, 1], strides=[1, 3, 3, 1],
                              padding="SAME", name="pool1")

        pool = tf.contrib.layers.flatten(pool)

    with tf.variable_scope("fc1"):
        fc = getfullyconnected(pool, 5 * 5 * 1024, 4096)
        drop = tf.nn.dropout(fc, keep_prob=0.5)

    with tf.variable_scope("fc2"):
        fc = getfullyconnected(drop, 4096, 4096)
        drop = tf.nn.dropout(fc, keep_prob=0.5)

    with tf.variable_scope("fc3"):
        logits = getfullyconnected(drop, 4096, 1000)

    return logits

完整的TensorBoard图如下所示: enter image description here

图形太小,但你可以在左边看到一系列粉红色的节点。此类细分的扩展版本如下所示:

enter image description here

扩展其中一个条件块(所有块类似!!)如下所示: enter image description here

我无法理解这些额外条件块的存在和存在。馈送到图表时的所有图像都是[221, 221, 3]大小。

您还可以在条件块中看到isVariableInitialized测试。我在会话启动后立即初始化我的变量。所以,我不明白为什么要执行这些检查。我已经发现这些条件块是由于使用tf.get_variable()来检查初始化它们是否导致任何性能差异?

另一个观察 当我减少batchsize时,我的tensorboard文件的大小也会减少。但图中显示的节点保持不变。 为什么会这样?

我的培训代码如下:

with tf.control_dependencies(putops):
    train_op = tf.group(apply_gradient_op, variables_averages_op)
sess.run(train_op) # tf.Session() as been defined before sess.run()

putops初始化为[],在每个GPU的图形构建过程中,填充如下:

# cpu_compute_stage is appended only once since it corresponds to centralized preprocessing
cpu_compute_stage = data_flow_ops.StagingArea(
                    [tf.float32, tf.int32],
                    shapes=[images_shape, labels_shape]
                )
                cpu_compute_stage_op = gpu_copy_stage.put(
                    [host_images, host_labels])
                putops.append(gpu_copy_stage_op)
# For each device the putops is further appended by gpu_compute_stage which is for each GPU since CPU-GPU copy has to take place
                with tf.device('/gpu:%d' % i):
                    with tf.name_scope('%s_%d' % (TOWER_NAME, i)) as scope:
                        gpu_compute_stage = data_flow_ops.StagingArea(
                            [tf.float32, tf.int32],
                            shapes=[images_shape, labels_shape]
                        )
                        gpu_compute_stage_op = gpu_compute_stage.put(
                            [host_images, host_labels]
                            )
                        putops.append(gpu_compute_stage_op)

但是,尽管我确实初始化了全局变量和局部变量,但我的代码仍未运行。

0 个答案:

没有答案