我正在通过实现一个简单的逻辑回归分类器来学习TensorFlow,该分类器在输入MNIST图像时输出数字是否为7。我正在使用随机梯度下降。 Tensorflow代码的关键是
# Maximum number of epochs
MaxEpochs = 1
# Learning rate
eta = 1e-2
ops.reset_default_graph()
n_x = 784
n_y = 1
x_tf = tf.placeholder(tf.float32, shape = [n_x, 1], name = 'x_tf')
y_tf = tf.placeholder(tf.float32, shape = [n_y, 1], name = 'y_tf')
w_tf = tf.get_variable(name = "w_tf", shape = [n_x, 1], initializer = tf.initializers.random_uniform());
b_tf = tf.get_variable(name = "b_tf", shape = [n_y, 1], initializer = tf.initializers.random_uniform());
z_tf = tf.add(tf.matmul(w_tf, x_tf, transpose_a = True), b_tf, name = 'z_tf')
yPred_tf = tf.sigmoid(z_tf, name = 'yPred_tf')
Loss_tf = tf.nn.sigmoid_cross_entropy_with_logits(logits = yPred_tf, labels = y_tf, name = 'Loss_tf')
with tf.name_scope('Training'):
optimizer_tf = tf.train.GradientDescentOptimizer(learning_rate = eta)
train_step = optimizer_tf.minimize(Loss_tf)
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
for Epoch in range(MaxEpochs):
for Sample in range(len(XTrain)):
x = XTrain[Sample]
y = YTrain[Sample].reshape([-1,1])
Train_sample = {x_tf: x, y_tf: y}
sess.run(train_step, feed_dict = Train_sample)
toc = time.time()
print('\nElapsed time is: ', toc-tic,'s');
它将构建以下图形(为方便起见已删除了与Tensorboard相关的代码):
问题在于,即使权重和偏差是随机初始化的(非零),神经元也没有受到训练。重量直方图如下。
我不想发布如此微不足道的内容,但我已步履维艰。抱歉,很长的帖子。非常感谢您的指导。稍微注意一点,我用numpy(相同的随机实现)执行此操作需要93.35s,只花了10秒钟左右,为什么会这样?
编辑:完整的代码,如果问题出在我先前未曾想到的地方。
import tensorflow as tf
import numpy as np
import h5py
from tensorflow.python.framework import ops
import time
mnist = tf.keras.datasets.mnist
(x_train, y_train),(x_test, y_test) = mnist.load_data()
def Flatten(Im):
FlatImArray = Im.reshape([Im.shape[0],-1,1])
return FlatImArray
DigitTested = 7
# Sperating the images with 7s from the rest
TrainIdxs = [];
for i in range(len(y_train)):
if(y_train[i] == DigitTested):
TrainIdxs.append(i)
TestIdxs = [];
for i in range(len(y_test)):
if(y_test[i] == DigitTested):
TestIdxs.append(i)
# Preparing the Datasets for training and testing
XTrain = Flatten(x_train);
YTrain = np.zeros([len(x_train),1]);
YTrain[TrainIdxs] = 1;
XTest = Flatten(x_test);
YTest = np.zeros([len(x_test),1]);
YTest[TestIdxs] = 1;
tic = time.time()
# Maximum number of epochs
MaxEpochs = 1
# Learning rate
eta = 1e-2
# Number of Epochs after which the neuron is validated
ValidationInterval = 1
ops.reset_default_graph() # to be able to rerun the model without overwriting tf variables
n_x = 784
n_y = 1
x_tf = tf.placeholder(tf.float32, shape = [n_x, 1], name = 'x_tf')
y_tf = tf.placeholder(tf.float32, shape = [n_y, 1], name = 'y_tf')
w_tf = tf.get_variable(name = "w_tf", shape = [n_x, 1], initializer = tf.initializers.random_uniform());
b_tf = tf.get_variable(name = "b_tf", shape = [n_y, 1], initializer = tf.initializers.random_uniform());
z_tf = tf.add(tf.matmul(w_tf, x_tf, transpose_a = True), b_tf, name = 'z_tf')
yPred_tf = tf.sigmoid(z_tf, name = 'yPred_tf')
Loss_tf = tf.nn.sigmoid_cross_entropy_with_logits(logits = yPred_tf, labels = y_tf, name = 'Loss_tf')
with tf.name_scope('Training'):
optimizer_tf = tf.train.GradientDescentOptimizer(learning_rate = eta)
train_step = optimizer_tf.minimize(Loss_tf)
writer = tf.summary.FileWriter(r"C:\Users\braja\Documents\TBSummaries\MNIST1NTF\2")
tf.summary.histogram('Weights', w_tf)
tf.summary.scalar('Loss', tf.reshape(Loss_tf, []))
tf.summary.scalar('Bias', tf.reshape(b_tf, []))
merged_summary = tf.summary.merge_all()
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
for Epoch in range(MaxEpochs):
for Sample in range(len(XTrain)):
x = XTrain[Sample]
y = YTrain[Sample].reshape([-1,1])
Train_sample = {x_tf: x, y_tf: y}
MergedSumm, _ = sess.run([merged_summary, train_step], feed_dict = Train_sample)
writer.add_summary(summary = MergedSumm, global_step = Sample)
if((Epoch+1) %ValidationInterval == 0):
ValidationError = 0
for Sample in range(len(XTest)):
x = XTest[Sample]
y = YTest[Sample].reshape([-1,1])
Test_sample = {x_tf: x, y_tf: y}
yPred = sess.run(yPred_tf, feed_dict = Test_sample)
ValidationError += abs(yPred - YTest[Sample])
print('Validation Error at', Epoch+1,'Epoch:', ValidationError);
writer.add_graph(tf.Session().graph)
writer.close()
toc = time.time()
print('\nElapsed time is: ', toc-tic,'s');
答案 0 :(得分:2)
看着偏差值,好像您看到了S型函数饱和。
当您将S型输入(z_tf
)推到S型功能的最末端时,会发生这种情况。发生这种情况时,返回的梯度是如此之低,以至于训练停滞了。造成这种情况的可能原因是您似乎在使用Sigmoid函数时倍增。 sigmoid_cross_entropy_with_logits
将sigmoid应用于其输入,但是您已经自己实现了。尝试删除其中之一。
此外,默认情况下,tf.initializers.random_uniform())
产生0:1之间的随机值。您可能想要初始化权重,并以大约0的对称对称性开始,并以非常小的值开始。这可以通过将参数minval
和maxval
传递到tf.initializers.random_uniform()来完成。
它们应该在训练过程中长大,这又可以防止乙状结肠饱和。