我正在努力训练火车在张量流中用于识别字母表的4层神经网络。但是我的准确率大约是10%,而我在3层的同一数据集上的准确率是90%。对于某些迭代,损失也是 nan 。我似乎无法找到问题。下面是生成计算图的代码。
batch_size = 128
beta = 0.01
inputs = image_size*image_size
hidden_neurons = [inputs, 1024, 512, 256,]
graph = tf.Graph()
with graph.as_default():
global_step = tf.Variable(0)
# Input data. For the training data, we use a placeholder that will be fed
# at run time with a training minibatch.
tf_train_dataset = tf.placeholder(tf.float32,
shape=(batch_size, image_size * image_size))
tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
tf_valid_dataset = tf.constant(valid_dataset)
tf_test_dataset = tf.constant(test_dataset)
#Hidden Layer Neurons
tf_hidden_neurons_1 = tf.constant(1024)
tf_hidden_neurons_2 = tf.constant(512)
weights_1 = tf.Variable(
tf.truncated_normal([image_size * image_size, tf_hidden_neurons_1]))
biases_1 = tf.Variable(tf.zeros([tf_hidden_neurons_1]))
weights_2 = tf.Variable(
tf.truncated_normal([tf_hidden_neurons_1, tf_hidden_neurons_2]))
biases_2 = tf.Variable(tf.zeros([tf_hidden_neurons_2]))
weights_3 = tf.Variable(
tf.truncated_normal([tf_hidden_neurons_2, num_labels]))
biases_3 = tf.Variable(tf.zeros([num_labels]))
# Training computation.
reluActivations_1 = tf.nn.relu(tf.matmul(tf_train_dataset, weights_1) + biases_1)
reluActivations_2 = tf.nn.relu(tf.matmul(reluActivations_1, weights_2) + biases_2)
logits = tf.matmul(reluActivations_2, weights_3) + biases_3
loss = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels, logits=logits))
# Add Regularization
regulizationTerm = tf.nn.l2_loss(weights_1) + tf.nn.l2_loss(weights_2) +tf.nn.l2_loss(weights_3)
loss = tf.reduce_mean(loss + beta*regulizationTerm)
# Optimizer.
learning_rate = tf.train.exponential_decay(0.3, global_step, 100000, 0.7, staircase=True)
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
# optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)
# Predictions for the training, validation, and test data.
train_prediction = tf.nn.softmax(logits)
valid_prediction = tf.nn.relu(tf.matmul(tf_valid_dataset, weights_1) + biases_1)
valid_prediction = tf.nn.softmax(tf.matmul(valid_prediction, weights_2) + biases_2)
test_prediction = tf.nn.relu(tf.matmul(tf_test_dataset, weights_1) + biases_1)
test_prediction = tf.nn.softmax(tf.matmul(test_prediction, weights_2) + biases_2)
帮助将不胜感激。
答案 0 :(得分:0)
我希望你熟悉消失梯度和爆炸梯度问题。
在大多数情况下,当渐变太小或爆炸时,你会得到nan。使用np.clip
剪裁渐变,这样你不应该得到任何nan。解决了这个问题后,我们可以继续解决您的低精度问题。