Question

我正在学习Udacity的深度学习课程。其中一项任务是将正则化和丢包实现到多层神经网络中。

实施后，我的小批量在第0步失去了疯狂的高，在步骤1变为无穷大，然后在输出的其余部分不存在

Offset at step 0: 0
Minibatch loss at step 0: 187359330304.000000
Minibatch accuracy: 10.2%
Validation accuracy: 10.0% 

Offset at step 1: 128
Minibatch loss at step 1: inf
Minibatch accuracy: 14.1%
Validation accuracy: 10.0% 

Offset at step 2: 256
Minibatch loss at step 2: nan
Minibatch accuracy: 7.8%
Validation accuracy: 10.0% 

Offset at step 3: 384
Minibatch loss at step 3: nan
Minibatch accuracy: 11.7%
Validation accuracy: 10.0%

以下是所有相关代码。我相信它与我完成优化的方式无关（因为这是从给定的任务中获取的）或我的正规化所以我不确定它可能在哪里。我也玩过隐藏层中的节点数（1024> 300> 60），但它也做了同样的事情。

这是我的代码（借口缩进，在我的代码中它是正确的）：

batch_size = 128
num_nodes_1 = 768
num_nodes_2 = 1024
num_nodes_3 = 512
dropout_value = 0.5
beta = 0.01

graph = tf.Graph()
with graph.as_default():

tf_train_data = tf.placeholder(tf.float32, shape=(batch_size, image_size*image_size))
tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
tf_valid_data = tf.constant(valid_dataset)
tf_test_data = tf.constant(test_dataset)

def gen_weights_biases(input_size, output_size):
    weights = tf.Variable(tf.truncated_normal([input_size, output_size]))
    biases = tf.Variable(tf.zeros([output_size]))
    return weights, biases

weights_1, biases_1 = gen_weights_biases(image_size*image_size, num_nodes_1)
weights_2, biases_2 = gen_weights_biases(num_nodes_1, num_nodes_2)
weights_3, biases_3 = gen_weights_biases(num_nodes_2, num_nodes_3)
weights_4, biases_4 = gen_weights_biases(num_nodes_3, num_labels)

logits_1 = tf.matmul(tf_train_data, weights_1) + biases_1
h_layer_1 = tf.nn.relu(logits_1)
h_layer_1 = tf.nn.dropout(h_layer_1, dropout_value)

logits_2 = tf.matmul(h_layer_1, weights_2) + biases_2
h_layer_2 = tf.nn.relu(logits_2)
h_layer_2 = tf.nn.dropout(h_layer_2, dropout_value)

logits_3 = tf.matmul(h_layer_2, weights_3) + biases_3
h_layer_3 = tf.nn.relu(logits_3)
h_layer_3 = tf.nn.dropout(h_layer_3, dropout_value)

logits_4 = tf.matmul(h_layer_3, weights_4) + biases_4

loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels, logits=logits_4))
regularization = tf.nn.l2_loss(logits_1) + tf.nn.l2_loss(logits_2) + tf.nn.l2_loss(logits_3) + tf.nn.l2_loss(logits_4)
reg_loss = tf.reduce_mean(loss + regularization * beta)

global_step = tf.Variable(0)
learning_rate = tf.train.exponential_decay(0.5, global_step, 750, 0.8)
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(reg_loss, global_step=global_step)

train_prediction = tf.nn.softmax(logits_4)

def make_prediction(input_data):
    p_logits_1 = tf.matmul(input_data, weights_1) + biases_1
    p_layer_1 = tf.nn.relu(p_logits_1)
    p_logits_2 = tf.matmul(p_layer_1, weights_2) + biases_2
    p_layer_2 = tf.nn.relu(p_logits_2)
    p_logits_3 = tf.matmul(p_layer_2, weights_3) + biases_3
    p_layer_3 = tf.nn.relu(p_logits_3)

    p_logits_4 = tf.matmul(p_layer_3, weights_4) + biases_4
    return tf.nn.relu(p_logits_4)

valid_prediction = make_prediction(tf_valid_data)
test_prediction = make_prediction(tf_test_data)

num_steps = 10001

with tf.Session(graph=graph) as session:
tf.global_variables_initializer().run()
print("Initialized \n")

for step in range(num_steps):
    offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
    batch_data = train_dataset[offset:(offset + batch_size), :]
    batch_labels = train_labels[offset:(offset + batch_size), :]

    feed_dict = {tf_train_data:batch_data, tf_train_labels:batch_labels}

    _, l, predictions = session.run([optimizer, reg_loss, train_prediction], feed_dict=feed_dict)

    if(step % 1 == 0):
        print("Offset at step %d: %d" % (step, offset))
        print("Minibatch loss at step %d: %f" % (step, l))
        print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))
        print("Validation accuracy: %.1f%% \n" % accuracy(valid_prediction.eval(), valid_labels))

print("Test accuracy: %.1f%%" % accuracy(test_prediction.eval(), test_labels))

为什么会发生这种情况，我该如何解决？

Answer 1

问题是权重的标准偏差。我不确定为什么会这样解决，如果有人能解释我会很感激。无论如何，修复是：

def gen_weights_biases(input_size, output_size):
    weights = tf.Variable(tf.truncated_normal([input_size, output_size], stddev=math.sqrt(2.0/(input_size))))
    biases = tf.Variable(tf.zeros([output_size]))
    return weights, biases

β率也必须降至0.0001

NaN在正规化和深度神经网络中的丢失方面具有巨大的损失价值

1 个答案: