我在TensorFlow(1.8)中有一个神经网络,该神经网络具有(看似)较大的输出矢量值。这是从大约1个小时的培训课程开始到培训的一些内容,在这里我们有一个分类网络,可以对三个标签(即课程)进行培训。可以看到训练和测试数据的损失和准确性,以及三个原始输出值的标准偏差(以下清单中的Ypred
节点)。这些是softmax之前的原始值。
与我看到的其他NN相比,这些值似乎过大。
#1350000: Training Loss=89.791, Acc=0.735; Test Loss=79.961, Acc=0.792; RawStDev=[175.123, 382.312, 130.729, ]
#1360000: Training Loss=91.001, Acc=0.729; Test Loss=77.937, Acc=0.787; RawStDev=[172.724, 366.065, 134.253, ]
#1370000: Training Loss=86.340, Acc=0.751; Test Loss=83.953, Acc=0.773; RawStDev=[181.191, 383.081, 130.524, ]
#1380000: Training Loss=86.987, Acc=0.743; Test Loss=83.830, Acc=0.790; RawStDev=[182.473, 381.195, 137.126, ]
#1390000: Training Loss=88.804, Acc=0.729; Test Loss=79.096, Acc=0.787; RawStDev=[175.505, 371.759, 135.942, ]
#1400000: Training Loss=83.822, Acc=0.754; Test Loss=81.093, Acc=0.798; RawStDev=[173.978, 376.775, 136.153, ]
#1410000: Training Loss=85.469, Acc=0.735; Test Loss=79.343, Acc=0.793; RawStDev=[180.332, 386.373, 129.154, ]
#1420000: Training Loss=86.125, Acc=0.738; Test Loss=77.993, Acc=0.803; RawStDev=[190.086, 386.139, 129.828, ]
#1430000: Training Loss=85.288, Acc=0.732; Test Loss=82.180, Acc=0.782; RawStDev=[183.839, 381.932, 125.370, ]
#1440000: Training Loss=83.263, Acc=0.747; Test Loss=79.853, Acc=0.806; RawStDev=[177.329, 367.796, 125.690, ]
前三层(密集)的代码和SGD定义如下所示:
with tf.name_scope('DensePost'):
Xnew = tf.matmul(gathered, weights['Post1']) + biases['Post1']
Xnew = tf.nn.dropout(Xnew, keep_prob)
Xnew = tf.matmul(Xnew, weights['Post2']) + biases['Post2']
Xnew = tf.nn.dropout(Xnew, keep_prob)
Xnew = tf.matmul(Xnew, weights['Post3']) + biases['Post3']
Xnew = tf.nn.dropout(Xnew, keep_prob)
with tf.name_scope('DenseOut'):
# Linear activation
Ypred = tf.add(tf.matmul(Xnew, weights['out']), biases['out'], name="Ypred_raw")
# Compute softmax result
YpredSoftMax = tf.nn.softmax(Ypred, dim=1, name="Prediction")
YpredIndex = tf.argmax(YpredSoftMax, axis=1, name="PredIndex")
gd.node_name_output = "DenseOut/Ypred_raw" # This is critical as this is needed for Freezing and for running Inferencing in Caelum
# Loss, optimizer and evaluation; Regularization term
l2 = gd.lambda_loss_amount * sum(tf.nn.l2_loss(tf_var) for tf_var in tf.trainable_variables())
# L2 loss prevents this overkill neural network to overfit the data
gd.cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=Ypred, labels=Ytrue)) + l2 # Softmax loss
gd.optimizer = tf.train.AdamOptimizer(learning_rate=gd.learning_rate).minimize(gd.cost) # Adam Optimizer
correct_pred = tf.equal(tf.argmax(Ypred, 1), tf.argmax(Ytrue, 1))
gd.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
我的问题:培训设置有问题吗?还是只需要延长培训时间?
答案 0 :(得分:0)
问题确实是初始化。当我在完全连接的层上切换到初始化器的G-Side and R-Side Random Effects and Covariance Structures
时,值开始变小并稳定下来了:
Relationship with Generalized Linear Models