我正在训练一个卷积神经网络,我在shuffle_batch分数摘要中得到了一些意想不到的行为,或者我只是不理解它。有人可以解释一下吗?这两个图之间的区别在于我交换了损失函数。
使用此丢失功能,我得到的行为0.0
loss = tf.nn.l2_loss(expected_labels-labels)
虽然这个给了我一个常数1.0(在第一次达到1.0之后)
loss = tf.reduce_mean(tf.square(expected_labels - labels))
损失函数的变化真的会导致这种变化吗?我不确定这意味着什么。
编辑:代码按要求 第一部分是设置批处理和大局。
filename_queue = tf.train.string_input_producer(filenames,
num_epochs=None)
label, image = read_and_decode_single_example(filename_queue=filename_queue)
image = tf.image.decode_jpeg(image.values[0], channels=3)
jpeg = tf.cast(image, tf.float32) / 255.
jpeg.set_shape([66,200,3])
images_batch, labels_batch = tf.train.shuffle_batch(
[jpeg, label], batch_size= FLAGS.batch_size,
num_threads=8,
capacity=60000,
min_after_dequeue=10000)
images_placeholder, labels_placeholder = placeholder_inputs(
FLAGS.batch_size)
label_estimations, W1_conv, h1_conv, current_images = e2e.inference(images_placeholder)
# Add to the Graph the Ops for loss calculation.
loss = e2e.loss(label_estimations, labels_placeholder)
# Decay once per epoch, using an exponential schedule starting at 0.01.
# Add to the Graph the Ops that calculate and apply gradients.
train_op = e2e.training(loss, FLAGS.learning_rate, FLAGS.batch_size)
这里有推理损失和训练的方法
def inference(images):
with tf.name_scope('conv1'):
W_conv1 = tf.Variable(tf.truncated_normal([5, 5, 3, FEATURE_MAPS_C1], stddev=STDDEV))
b_conv1 = tf.Variable(tf.constant(BIAS_INIT, shape=[FEATURE_MAPS_C1]))
h_conv1 = tf.nn.bias_add(
tf.nn.conv2d(images, W_conv1, strides=[1, 2, 2, 1], padding='VALID'), b_conv1)
with tf.name_scope('conv2'):
W_conv2 = tf.Variable(tf.truncated_normal([5, 5, FEATURE_MAPS_C1, 36], stddev=STDDEV))
b_conv2 = tf.Variable(tf.constant(BIAS_INIT, shape=[36]))
h_conv2 = tf.nn.conv2d(h_conv1, W_conv2, strides=[1, 2, 2, 1], padding='VALID') + b_conv2
with tf.name_scope('conv3'):
W_conv3 = tf.Variable(tf.truncated_normal([5, 5, 36, 48], stddev=STDDEV))
b_conv3 = tf.Variable(tf.constant(BIAS_INIT, shape=[48]))
h_conv3 = tf.nn.conv2d(h_conv2, W_conv3, strides=[1, 2, 2, 1], padding='VALID') + b_conv3
with tf.name_scope('conv4'):
W_conv4 = tf.Variable(tf.truncated_normal([3, 3, 48, 64], stddev=STDDEV))
b_conv4 = tf.Variable(tf.constant(BIAS_INIT, shape=[64]))
h_conv4 = tf.nn.conv2d(h_conv3, W_conv4, strides=[1, 1, 1, 1], padding='VALID') + b_conv4
with tf.name_scope('conv5'):
W_conv5 = tf.Variable(tf.truncated_normal([3, 3, 64, 64], stddev=STDDEV))
b_conv5 = tf.Variable(tf.constant(BIAS_INIT, shape=[64]))
h_conv5 = tf.nn.conv2d(h_conv4, W_conv5, strides=[1, 1, 1, 1], padding='VALID') + b_conv5
h_conv5_flat = tf.reshape(h_conv5, [-1, 1 * 18 * 64])
with tf.name_scope('fc1'):
W_fc1 = tf.Variable(tf.truncated_normal([1 * 18 * 64, 100], stddev=STDDEV))
b_fc1 = tf.Variable(tf.constant(BIAS_INIT, shape=[100]))
h_fc1 = tf.matmul(h_conv5_flat, W_fc1) + b_fc1
with tf.name_scope('fc2'):
W_fc2 = tf.Variable(tf.truncated_normal([100, 50], stddev=STDDEV))
b_fc2 = tf.Variable(tf.constant(BIAS_INIT, shape=[50]))
h_fc2 = tf.matmul(h_fc1, W_fc2) + b_fc2
with tf.name_scope('fc3'):
W_fc3 = tf.Variable(tf.truncated_normal([50, 10], stddev=STDDEV))
b_fc3 = tf.Variable(tf.constant(BIAS_INIT, shape=[10]))
h_fc3 = tf.matmul(h_fc2, W_fc3) + b_fc3
with tf.name_scope('fc4'):
W_fc4 = tf.Variable(tf.truncated_normal([10, 1], stddev=STDDEV))
b_fc4 = tf.Variable(tf.constant(BIAS_INIT, shape=[1]))
h_fc4 = tf.matmul(h_fc3, W_fc4) + b_fc4
return h_fc4
这是损失函数,使用l2导致问题。
def loss(label_estimations, labels):
n_labels = tf.reshape(label_estimations, [-1])
# Here are the two loss functions
#loss = tf.reduce_mean(tf.square(n_labels - labels))
loss = tf.nn.l2_loss(n_labels-labels)
return loss
训练方法:
def training(loss, learning_rate, batch_size):
global_step = tf.Variable(0, name='global_step', trainable=False)
tf.scalar_summary('learning_rate',learning_rate)
tf.scalar_summary('Loss ('+loss.op.name+')', loss)
optimizer = tf.train.AdamOptimizer(learning_rate)
train_op = optimizer.minimize(loss, global_step=global_step)
return train_op
tf.reduce_sum(tf.square(n_labels - labels)/2)
答案 0 :(得分:1)
正如TensorFlow的原始指南https://www.tensorflow.org/programmers_guide/reading_data
中所述您需要多少个线程? tf.train.shuffle_batch *函数向图表添加一个摘要,指示示例队列的完整程度。如果您有足够的读取线程,该摘要将保持在零以上。您可以使用TensorBoard查看摘要作为培训进度。
如果队列永远不空,似乎更好,即" fraction_full"保持非零。如果没有,您应该为queue_runner
答案 1 :(得分:0)
您的损失与list1 = ['AAAABBBBCCCC','DDDDEEEEFFFF','GGGGHHHHIIII','JJJJKKKKLLL']
n = 4
list1b = [[sublist[i:i+n] for i in range(0, len(sublist), n)] for sublist in list1]
print (list1b)
之间的唯一区别是缩放,因此您可能需要考虑您的学习率/其他超参数来考虑这一点。
l2
而你的费用是
1/2 SUM_i^N (pred(x_i) - y_i)^2
当然,因为你正在使用随机梯度方法,所以你使用的是近似形式
1/N SUM_i^N (pred(x_i) - y_i)^2
因此,您必须将费用乘以1/2 SUM_{(x_i, y_i) in batch} (pred(x_i) - y_i)^2 # l2
1/#batch SUM_{(x_i, y_i) in batch} (pred(x_i) - y_i)^2 # you
才能获得原始费用。通常情况下这不是问题,但有时错误的缩放可能会使你处于错误表面的非常退化的部分,而优化器将会失败(尤其是像Adam这样具有侵略性的扩展)。
旁注 - 您知道您的模型是深线性模型吗?模型中没有任何非线性。这是一个非常具体的网络。