我正在做TensotFlow的Udacity课程,我试图在notMNIST集上训练一个神经网络。
当使用1隐藏层网络时,一切正常,但是当我尝试添加另一层时,在大约150步后我得到了这个错误:
InvalidArgumentError: ReluGrad input is not finite. : Tensor had NaN values
这是网络模型:
def model(x, w_h,w_h2,w_0,b_h,b_h2,b_0,p_drop):
h = tf.nn.relu(tf.matmul(x,w_h)+b_h)
h = tf.nn.dropout(h,p_drop)
h2 = tf.nn.relu(tf.matmul(h, w_h2)+b_h2)
h2 = tf.nn.dropout(h2,p_drop)
return (tf.matmul(h2,w_0)+b_0)
错误指向一个特定的行:
h = tf.nn.relu(tf.matmul(x,w_h)+b_h)
我想用双层网络,w_h变得非常小,所以matmul产品变为零,但我不明白我是如何解决它的 请注意,我正在使用此优化程序:
net = model(tf_train_dataset,w_h,w_h2,w_0,b_h,b_h2,b_0,0.5)
loss = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(net, tf_train_labels))
global_step = tf.Variable(0) # count the number of steps taken.
learning_rate = tf.train.exponential_decay(0.5, global_step, 100, 0.95)
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss,global_step=global_step)
网是784-> 1024-> 512-> 10
任何帮助将不胜感激......
答案 0 :(得分:0)
当我的权重随机初始化时,我遇到了同样的问题,并且偏差为偏差。使用Xavier and Yoshua的初始化解决了问题,这是我的完整示例:
hidden_size = 1024
batch_size = 256
def multilayer(x, w, b):
for i, (wi, bi) in enumerate(zip(w, b)):
if i == 0:
out = tf.nn.relu(tf.matmul(x, wi) + bi)
elif i == len(w) - 1:
out = tf.matmul(out, wi) + bi
else:
out = tf.nn.relu(tf.matmul(out, wi) + bi)
print(out.shape, x.shape)
return out
graph = tf.Graph()
with graph.as_default():
# Input data. For the training data, we use a placeholder that will be fed
# at run time with a training minibatch.
tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size * image_size))
tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
tf_valid_dataset = tf.constant(valid_dataset)
tf_test_dataset = tf.constant(test_dataset)
# Defining Xavier and Yoshua's initializer
initializer = tf.contrib.layers.xavier_initializer()
# Variables
W1 = tf.Variable(initializer([image_size * image_size, hidden_size]))
b1 = tf.Variable(initializer([hidden_size]))
W2 = tf.Variable(initializer([hidden_size, hidden_size]))
b2 = tf.Variable(initializer([hidden_size]))
W3 = tf.Variable(initializer([hidden_size, hidden_size]))
b3 = tf.Variable(initializer([hidden_size]))
W4 = tf.Variable(initializer([hidden_size, hidden_size]))
b4 = tf.Variable(initializer([hidden_size]))
W5 = tf.Variable(initializer([hidden_size, num_labels]))
b5 = tf.Variable(initializer([num_labels]))
Ws = [W1, W2, W3, W4, W5]
bs = [b1, b2, b3, b4, b5]
# Training computation
logits = multilayer(tf_train_dataset, Ws, bs)
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=tf_train_labels))
#NOTE loss is actually a scalar value that represents the effectiveness of the
# current prediction. A minimized loss means that the weights and biases
# are adjusted at their best for the training data.
# Optimizer
optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)
# Predictions for the training, validation, and test data.
train_prediction = tf.nn.softmax(logits)
valid_prediction = tf.nn.softmax(multilayer(tf_valid_dataset, Ws, bs))
test_prediction = tf.nn.softmax(multilayer(tf_test_dataset, Ws, bs))