简单的多层感知器模型不会收敛于TensorFlow

时间:2016-01-12 20:15:42

标签: deep-learning tensorflow

我是TensorFlow的新手。今天我尝试在TF中实现我的第一个模型,但它返回了奇怪的结果。我知道我在这里遗漏了一些东西,但我无法弄明白。这是故事。

模型

我有一个简单的多层感知器模型,在MNIST数据库上只应用了一个隐藏层。层被定义为[input(784),hidden_​​layer(470),output_layer(10)],其中tanh为隐藏层的非线性,softmax为输出层的损失。我使用的优化器是 Gradient Descent 算法,学习率为0.01。我的迷你批量大小为1(我正在逐个训练模型样本)。

我的实施:

  1. 首先,我用C ++实现了我的模型,准确率达到96%左右。这是存储库:https://github.com/amin2ros/Artificog
  2. 我在TensorFlow中实现了精确的模型,但令人惊讶的是模型根本没有收敛。这是代码。
  3. 代码:

    import sys
    import input_data
    import matplotlib.pyplot as plt
    from pylab import *
    mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)
    import tensorflow as tf
    # Parameters
    learning_rate = 0.1
    training_epochs = 1
    batch_size = 1
    display_step = 1
    # Network Parameters
    n_hidden_1 = 470 # 1st layer num features
    n_input = 784 # MNIST data input (img shape: 28*28)
    n_classes = 10 # MNIST total classes (0-9 digits)
    # tf Graph input
    x = tf.placeholder("float", [None, n_input])
    y = tf.placeholder("float", [None, n_classes])
    # Create model
    def multilayer_perceptron(_X, _weights, _biases):
        layer_1 = tf.tanh(tf.add(tf.matmul(_X, _weights['h1']), _biases['b1'])) 
        return tf.matmul(layer_1, _weights['out']) + _biases['out']
    # Store layers weight & bias
    weights = {
        'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
        'out': tf.Variable(tf.random_normal([n_hidden_1, n_classes]))
    }
    biases = {
        'b1': tf.Variable(tf.random_normal([n_hidden_1])),
        'out': tf.Variable(tf.random_normal([n_classes]))
    }
    # Construct model
    pred = multilayer_perceptron(x, weights, biases)
    # Define loss and optimizer
    cost = tf.reduce_mean(tf.nn.softmax(pred)) # Softmax loss
    optimizer = tf.train.GradientDescentOptimizer(0.01).minimize(cost) #
    # Initializing the variables
    init = tf.initialize_all_variables()
    # Launch the graph
    with tf.Session() as sess:
        sess.run(init)
        # Training cycle
        for epoch in range(training_epochs):
            avg_cost = 0.
            m= 0 
            total_batch = int(mnist.train.num_examples/batch_size)
            counter=0
            #print 'count = ' , total_batch
            #sys.stdin.read(1)
            # Loop over all batches
            for i in range(total_batch):
                batch_xs, batch_ys = mnist.train.next_batch(batch_size)
                label = tf.argmax(batch_ys,1).eval()[0] 
                counter+=1
                sess.run(optimizer, feed_dict={x: batch_xs, y: batch_ys})
                wrong_prediction = tf.not_equal(tf.argmax(pred, 1), tf.argmax(y, 1))
                missed=tf.cast(wrong_prediction, "float")
                m += missed.eval({x: batch_xs, y: batch_ys})[0]
                print "Sample #", counter , " - Label : " , label , " - Prediction :" , tf.argmax(pred, 1).eval({x: batch_xs, y: batch_ys})[0]  ,\
                 "- Missed = " , m ,  " - Error Rate = " , 100 * float(m)/counter
        print "Optimization Finished!"
    

    我很好奇为什么会这样。任何帮助表示赞赏。

    编辑:

    如下所述,成本函数的定义不正确所以它应该像

    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred,y))
    

    现在模型收敛:)

0 个答案:

没有答案