应用错误收集

简单的多层感知器模型不会收敛于TensorFlow

时间：2016-01-12 20:15:42

标签： deep-learning tensorflow

我是TensorFlow的新手。今天我尝试在TF中实现我的第一个模型，但它返回了奇怪的结果。我知道我在这里遗漏了一些东西，但我无法弄明白。这是故事。

模型

我有一个简单的多层感知器模型，在MNIST数据库上只应用了一个隐藏层。层被定义为[input（784），hidden_layer（470），output_layer（10）]，其中tanh为隐藏层的非线性，softmax为输出层的损失。我使用的优化器是 Gradient Descent 算法，学习率为0.01。我的迷你批量大小为1（我正在逐个训练模型样本）。

我的实施：

首先，我用C ++实现了我的模型，准确率达到96％左右。这是存储库：https://github.com/amin2ros/Artificog
我在TensorFlow中实现了精确的模型，但令人惊讶的是模型根本没有收敛。这是代码。

代码：

import sys
import input_data
import matplotlib.pyplot as plt
from pylab import *
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)
import tensorflow as tf
# Parameters
learning_rate = 0.1
training_epochs = 1
batch_size = 1
display_step = 1
# Network Parameters
n_hidden_1 = 470 # 1st layer num features
n_input = 784 # MNIST data input (img shape: 28*28)
n_classes = 10 # MNIST total classes (0-9 digits)
# tf Graph input
x = tf.placeholder("float", [None, n_input])
y = tf.placeholder("float", [None, n_classes])
# Create model
def multilayer_perceptron(_X, _weights, _biases):
    layer_1 = tf.tanh(tf.add(tf.matmul(_X, _weights['h1']), _biases['b1'])) 
    return tf.matmul(layer_1, _weights['out']) + _biases['out']
# Store layers weight & bias
weights = {
    'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
    'out': tf.Variable(tf.random_normal([n_hidden_1, n_classes]))
}
biases = {
    'b1': tf.Variable(tf.random_normal([n_hidden_1])),
    'out': tf.Variable(tf.random_normal([n_classes]))
}
# Construct model
pred = multilayer_perceptron(x, weights, biases)
# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax(pred)) # Softmax loss
optimizer = tf.train.GradientDescentOptimizer(0.01).minimize(cost) #
# Initializing the variables
init = tf.initialize_all_variables()
# Launch the graph
with tf.Session() as sess:
    sess.run(init)
    # Training cycle
    for epoch in range(training_epochs):
        avg_cost = 0.
        m= 0 
        total_batch = int(mnist.train.num_examples/batch_size)
        counter=0
        #print 'count = ' , total_batch
        #sys.stdin.read(1)
        # Loop over all batches
        for i in range(total_batch):
            batch_xs, batch_ys = mnist.train.next_batch(batch_size)
            label = tf.argmax(batch_ys,1).eval()[0] 
            counter+=1
            sess.run(optimizer, feed_dict={x: batch_xs, y: batch_ys})
            wrong_prediction = tf.not_equal(tf.argmax(pred, 1), tf.argmax(y, 1))
            missed=tf.cast(wrong_prediction, "float")
            m += missed.eval({x: batch_xs, y: batch_ys})[0]
            print "Sample #", counter , " - Label : " , label , " - Prediction :" , tf.argmax(pred, 1).eval({x: batch_xs, y: batch_ys})[0]  ,\
             "- Missed = " , m ,  " - Error Rate = " , 100 * float(m)/counter
    print "Optimization Finished!"

我很好奇为什么会这样。任何帮助表示赞赏。

编辑：

如下所述，成本函数的定义不正确所以它应该像

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred,y))

现在模型收敛：）

0 个答案:

没有答案