Tensorflow shuffle批次分数意外行为

时间:2016-10-30 09:11:17

标签: machine-learning neural-network tensorflow

我正在训练一个卷积神经网络,我在shuffle_batch分数摘要中得到了一些意想不到的行为,或者我只是不理解它。有人可以解释一下吗?这两个图之间的区别在于我交换了损失函数。

使用此丢失功能,我得到的行为0.0

loss = tf.nn.l2_loss(expected_labels-labels)

虽然这个给了我一个常数1.0(在第一次达到1.0之后)

loss = tf.reduce_mean(tf.square(expected_labels - labels))

损失函数的变化真的会导致这种变化吗?我不确定这意味着什么。

plot

编辑:代码按要求 第一部分是设置批处理和大局。

filename_queue = tf.train.string_input_producer(filenames,
                                                num_epochs=None)
label, image = read_and_decode_single_example(filename_queue=filename_queue)
image = tf.image.decode_jpeg(image.values[0], channels=3)
jpeg = tf.cast(image, tf.float32) / 255.
jpeg.set_shape([66,200,3])
images_batch, labels_batch = tf.train.shuffle_batch(
    [jpeg, label], batch_size= FLAGS.batch_size,
    num_threads=8,
    capacity=60000,
    min_after_dequeue=10000)
images_placeholder, labels_placeholder = placeholder_inputs(
    FLAGS.batch_size)

label_estimations, W1_conv, h1_conv, current_images = e2e.inference(images_placeholder)

# Add to the Graph the Ops for loss calculation.
loss = e2e.loss(label_estimations, labels_placeholder)


# Decay once per epoch, using an exponential schedule starting at 0.01.


# Add to the Graph the Ops that calculate and apply gradients.
train_op = e2e.training(loss, FLAGS.learning_rate, FLAGS.batch_size)

这里有推理损失和训练的方法

def inference(images):
with tf.name_scope('conv1'):
    W_conv1 = tf.Variable(tf.truncated_normal([5, 5, 3, FEATURE_MAPS_C1], stddev=STDDEV))
    b_conv1 = tf.Variable(tf.constant(BIAS_INIT, shape=[FEATURE_MAPS_C1]))
    h_conv1 = tf.nn.bias_add(
        tf.nn.conv2d(images, W_conv1, strides=[1, 2, 2, 1], padding='VALID'), b_conv1)

with tf.name_scope('conv2'):
    W_conv2 = tf.Variable(tf.truncated_normal([5, 5, FEATURE_MAPS_C1, 36], stddev=STDDEV))
    b_conv2 = tf.Variable(tf.constant(BIAS_INIT, shape=[36]))
    h_conv2 = tf.nn.conv2d(h_conv1, W_conv2, strides=[1, 2, 2, 1], padding='VALID') + b_conv2

with tf.name_scope('conv3'):
    W_conv3 = tf.Variable(tf.truncated_normal([5, 5, 36, 48], stddev=STDDEV))
    b_conv3 = tf.Variable(tf.constant(BIAS_INIT, shape=[48]))
    h_conv3 = tf.nn.conv2d(h_conv2, W_conv3, strides=[1, 2, 2, 1], padding='VALID') + b_conv3

with tf.name_scope('conv4'):
    W_conv4 = tf.Variable(tf.truncated_normal([3, 3, 48, 64], stddev=STDDEV))
    b_conv4 = tf.Variable(tf.constant(BIAS_INIT, shape=[64]))
    h_conv4 = tf.nn.conv2d(h_conv3, W_conv4, strides=[1, 1, 1, 1], padding='VALID') + b_conv4

with tf.name_scope('conv5'):
    W_conv5 = tf.Variable(tf.truncated_normal([3, 3, 64, 64], stddev=STDDEV))
    b_conv5 = tf.Variable(tf.constant(BIAS_INIT, shape=[64]))
    h_conv5 = tf.nn.conv2d(h_conv4, W_conv5, strides=[1, 1, 1, 1], padding='VALID') + b_conv5
    h_conv5_flat = tf.reshape(h_conv5, [-1, 1 * 18 * 64])


with tf.name_scope('fc1'):
    W_fc1 = tf.Variable(tf.truncated_normal([1 * 18 * 64, 100], stddev=STDDEV))
    b_fc1 = tf.Variable(tf.constant(BIAS_INIT, shape=[100]))
    h_fc1 = tf.matmul(h_conv5_flat, W_fc1) + b_fc1

with tf.name_scope('fc2'):
    W_fc2 = tf.Variable(tf.truncated_normal([100, 50], stddev=STDDEV))
    b_fc2 = tf.Variable(tf.constant(BIAS_INIT, shape=[50]))
    h_fc2 = tf.matmul(h_fc1, W_fc2) + b_fc2

with tf.name_scope('fc3'):
    W_fc3 = tf.Variable(tf.truncated_normal([50, 10], stddev=STDDEV))
    b_fc3 = tf.Variable(tf.constant(BIAS_INIT, shape=[10]))
    h_fc3 = tf.matmul(h_fc2, W_fc3) + b_fc3

with tf.name_scope('fc4'):
    W_fc4 = tf.Variable(tf.truncated_normal([10, 1], stddev=STDDEV))
    b_fc4 = tf.Variable(tf.constant(BIAS_INIT, shape=[1]))
    h_fc4 = tf.matmul(h_fc3, W_fc4) + b_fc4


return h_fc4

这是损失函数,使用l2导致问题。

def loss(label_estimations, labels):    
    n_labels = tf.reshape(label_estimations, [-1])
    # Here are the two loss functions
    #loss = tf.reduce_mean(tf.square(n_labels - labels))
    loss = tf.nn.l2_loss(n_labels-labels)
    return loss

训练方法:

def training(loss, learning_rate, batch_size): 
    global_step = tf.Variable(0, name='global_step', trainable=False)
    tf.scalar_summary('learning_rate',learning_rate)
    tf.scalar_summary('Loss ('+loss.op.name+')', loss)

    optimizer = tf.train.AdamOptimizer(learning_rate)
    train_op = optimizer.minimize(loss, global_step=global_step)
    return train_op

tf.reduce_sum(tf.square(n_labels - labels)/2)

的情节

Imgur

2 个答案:

答案 0 :(得分:1)

正如TensorFlow的原始指南https://www.tensorflow.org/programmers_guide/reading_data

中所述
  

您需要多少个线程? tf.train.shuffle_batch *函数向图表添加一个摘要,指示示例队列的完整程度。如果您有足够的读取线程,该摘要将保持在零以上。您可以使用TensorBoard查看摘要作为培训进度。

如果队列永远不空,似乎更好,即" fraction_full"保持非零。如果没有,您应该为queue_runner

分配更多主题

答案 1 :(得分:0)

您的损失与list1 = ['AAAABBBBCCCC','DDDDEEEEFFFF','GGGGHHHHIIII','JJJJKKKKLLL'] n = 4 list1b = [[sublist[i:i+n] for i in range(0, len(sublist), n)] for sublist in list1] print (list1b) 之间的唯一区别是缩放,因此您可能需要考虑您的学习率/其他超参数来考虑这一点。

TF中的l2损失定义为:

l2

而你的费用是

1/2 SUM_i^N (pred(x_i) - y_i)^2

当然,因为你正在使用随机梯度方法,所以你使用的是近似形式

1/N SUM_i^N (pred(x_i) - y_i)^2

因此,您必须将费用乘以1/2 SUM_{(x_i, y_i) in batch} (pred(x_i) - y_i)^2 # l2 1/#batch SUM_{(x_i, y_i) in batch} (pred(x_i) - y_i)^2 # you 才能获得原始费用。通常情况下这不是问题,但有时错误的缩放可能会使你处于错误表面的非常退化的部分,而优化器将会失败(尤其是像Adam这样具有侵略性的扩展)。

旁注 - 您知道您的模型是深线性模型吗?模型中没有任何非线性。这是一个非常具体的网络。