在TensorFlow中,ReluGrad输入在多层网络上不是有限的

时间:2016-03-09 19:57:30

标签: python tensorflow deep-learning

我正在做TensotFlow的Udacity课程,我试图在notMNIST集上训练一个神经网络。
当使用1隐藏层网络时,一切正常,但是当我尝试添加另一层时,在大约150步后我得到了这个错误:

InvalidArgumentError: ReluGrad input is not finite. : Tensor had NaN values

这是网络模型:

def model(x, w_h,w_h2,w_0,b_h,b_h2,b_0,p_drop):
h = tf.nn.relu(tf.matmul(x,w_h)+b_h)
h = tf.nn.dropout(h,p_drop)
h2 = tf.nn.relu(tf.matmul(h, w_h2)+b_h2)
h2 = tf.nn.dropout(h2,p_drop)
return (tf.matmul(h2,w_0)+b_0)

错误指向一个特定的行:

h = tf.nn.relu(tf.matmul(x,w_h)+b_h)

我想用双层网络,w_h变得非常小,所以matmul产品变为零,但我不明白我是如何解决它的 请注意,我正在使用此优化程序:

net = model(tf_train_dataset,w_h,w_h2,w_0,b_h,b_h2,b_0,0.5)
loss = tf.reduce_mean(
       tf.nn.softmax_cross_entropy_with_logits(net, tf_train_labels))
global_step = tf.Variable(0)  # count the number of steps taken.
learning_rate = tf.train.exponential_decay(0.5, global_step, 100, 0.95)
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss,global_step=global_step)

网是784-> 1024-> 512-> 10

任何帮助将不胜感激......

1 个答案:

答案 0 :(得分:0)

当我的权重随机初始化时,我遇到了同样的问题,并且偏差为偏差。使用Xavier and Yoshua的初始化解决了问题,这是我的完整示例:

hidden_size = 1024
batch_size = 256

def multilayer(x, w, b):
    for i, (wi, bi) in enumerate(zip(w, b)):
        if i == 0:
            out = tf.nn.relu(tf.matmul(x, wi) + bi)
        elif i == len(w) - 1:
            out = tf.matmul(out, wi) + bi
        else:
            out = tf.nn.relu(tf.matmul(out, wi) + bi)
    print(out.shape, x.shape)
    return out

graph = tf.Graph()
with graph.as_default():

    # Input data. For the training data, we use a placeholder that will be fed
    # at run time with a training minibatch.
    tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size * image_size))
    tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)

    # Defining Xavier and Yoshua's initializer
    initializer = tf.contrib.layers.xavier_initializer()

    # Variables
    W1 = tf.Variable(initializer([image_size * image_size, hidden_size]))
    b1 = tf.Variable(initializer([hidden_size]))
    W2 = tf.Variable(initializer([hidden_size, hidden_size]))
    b2 = tf.Variable(initializer([hidden_size]))
    W3 = tf.Variable(initializer([hidden_size, hidden_size]))
    b3 = tf.Variable(initializer([hidden_size]))
    W4 = tf.Variable(initializer([hidden_size, hidden_size]))
    b4 = tf.Variable(initializer([hidden_size]))
    W5 = tf.Variable(initializer([hidden_size, num_labels]))
    b5 = tf.Variable(initializer([num_labels]))

    Ws = [W1, W2, W3, W4, W5]
    bs = [b1, b2, b3, b4, b5]

    # Training computation
    logits = multilayer(tf_train_dataset, Ws, bs)
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=tf_train_labels))
    #NOTE loss is actually a scalar value that represents the effectiveness of the
    #     current prediction. A minimized loss means that the weights and biases
    #     are adjusted at their best for the training data.

    # Optimizer
    optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)

    # Predictions for the training, validation, and test data.
    train_prediction = tf.nn.softmax(logits)
    valid_prediction = tf.nn.softmax(multilayer(tf_valid_dataset, Ws, bs))
    test_prediction = tf.nn.softmax(multilayer(tf_test_dataset, Ws, bs))