我正在尝试用tensorflow中的一层写一个神经网络来对MNIST数据进行分类。采取的隐藏层的大小是30(我也尝试改变它,但问题仍然存在)。
问题是:当我不使用任何隐藏层并直接执行X*w + b
时,我得到85%的准确度,但当我按如下方式增加图层时,精度保持为0.113,交叉熵损失在2.3。我确信这是我最后的一个愚蠢的错误。有人可以指出代码有什么问题吗?
import os
import numpy as np
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import time
learning_rate = 0.01
batch_size = 128
n_epochs = 10
X = tf.placeholder(tf.float32, shape=(batch_size, 784))
Y = tf.placeholder(tf.float32, shape=(batch_size, 10))
w1 = tf.Variable(tf.zeros( [X.shape[1], 30]))
b1 = tf.Variable(tf.zeros([1, 30]))
z = tf.matmul(X,w1) + b1
a = tf.nn.relu(z)
w2 = tf.Variable(tf.zeros( [30, 10]))
b2 = tf.Variable(tf.zeros([1, 10]))
logits = tf.matmul(a,w2) + b2
entropy = tf.nn.softmax_cross_entropy_with_logits(logits = logits, labels = Y)
loss = tf.reduce_mean(entropy)
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)
with tf.Session() as sess:
start_time = time.time()
sess.run(tf.global_variables_initializer())
n_batches = int(mnist.train.num_examples/batch_size)
for i in range(n_epochs): # train the model n_epochs times
total_loss = 0
for _ in range(n_batches):
X_batch, Y_batch = mnist.train.next_batch(batch_size)
_, loss_batch = sess.run([optimizer, loss], feed_dict={X: X_batch, Y:Y_batch})
total_loss += loss_batch
print('Average loss epoch {0}: {1}'.format(i, total_loss/n_batches))
print('Optimization Finished!') # should be around 0.35 after 25 epochs
preds = tf.nn.softmax(logits)
correct_preds = tf.equal(tf.argmax(preds, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_sum(tf.cast(correct_preds, tf.float32))
n_batches = int(mnist.test.num_examples/batch_size)
total_correct_preds = 0
for i in range(n_batches):
X_batch, Y_batch = mnist.test.next_batch(batch_size)
_, accuracy_batch = sess.run([correct_preds, accuracy], feed_dict={X: X_batch, Y:Y_batch})
total_correct_preds += accuracy_batch
print('Accuracy {0}'.format(total_correct_preds/mnist.test.num_examples))
答案 0 :(得分:2)
尝试使用随机值而不是零来初始化权重,如下所述:
https://www.tensorflow.org/get_started/mnist/pros#weight_initialization
w1 = tf.Variable(tf.truncated_normal([784, 30], stddev=0.1))