我目前正在学习如何使用Tensorflow,并且在实现此Softmax回归应用时遇到了一些问题。
编译时没有错误,但是,对于某些重新排列的文本验证和测试预测没有显示出任何改善,只有训练预测显示出了改善。
我正在使用带有迷你批的Stocastic梯度下降(SGD),以便更快地收敛,但是不知道这是否会造成某种麻烦。
如果您能分享一些想法,我将很感激,这是完整的代码:
import input_data
import numpy as np
import random as ran
import tensorflow as tf
import matplotlib.pyplot as plt
mnist = input_data.read_data_sets('MNIST_Data/', one_hot=True)
#Features & Data
num_features = 784
num_labels = 10
learning_rate = 0.05
batch_size = 128
num_steps = 5001
train_dataset = mnist.train.images
train_labels = mnist.train.labels
test_dataset = mnist.test.images
test_labels = mnist.test.labels
valid_dataset = mnist.validation.images
valid_labels = mnist.validation.labels
graph = tf.Graph()
with graph.as_default():
tf_train_data = tf.placeholder(tf.float32, shape=(batch_size, num_features))
tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
tf_valid_data = tf.constant(valid_dataset)
tf_test_data = tf.constant(test_dataset)
W = tf.Variable(tf.truncated_normal([num_features, num_labels]))
b = tf.Variable(tf.zeros([num_labels]))
score_vector = tf.matmul(tf_train_data, W) + b
cost_func = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(
labels=tf_train_labels, logits=score_vector))
score_valid = tf.matmul(tf_test_data, W) + b
score_test = tf.matmul(tf_valid_data, W) + b
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost_func)
train_pred = tf.nn.softmax(score_vector)
valid_pred = tf.nn.softmax(score_valid)
test_pred = tf.nn.softmax(score_test)
def accuracy(predictions, labels):
correct_pred = np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1))
accu = (100.0 * correct_pred) / predictions.shape[0]
return accu
with tf.Session(graph=graph) as sess:
sess.run(tf.global_variables_initializer())
print("Initialized")
for step in range(num_steps):
offset = np.random.randint(0, train_labels.shape[0] - batch_size - 1)
batch_data = train_dataset[offset:(offset+batch_size), :]
batch_labels = train_labels[offset:(offset+batch_size), :]
feed_dict = {tf_train_data : batch_data,
tf_train_labels : batch_labels
}
_, l, predictions = sess.run([optimizer, cost_func, train_pred],
feed_dict=feed_dict)
if (step % 500 == 0):
print("Minibatch loss at step {0}: {1}".format(step, l))
print("Minibatch accuracy: {:.1f}%".format(
accuracy(predictions, batch_labels)))
print("Validation accuracy: {:.1f}%".format(
accuracy(valid_pred.eval(), valid_labels)))
print("\nTest accuracy: {:.1f}%".format(
accuracy(test_pred.eval(), test_labels)))
答案 0 :(得分:0)
听起来像过拟合,这并不奇怪,因为该模型基本上是线性回归模型。
您可以尝试以下几种选择:
1.添加隐藏层+激活功能(https://arxiv.org/abs/1511.07289:elu paper使用香草DNN处理mnist数据集)。
2.使用CNN或RNN,尽管CNN更适合解决图像问题。
3.使用更好的优化器。如果您是新手,请尝试使用ADAM优化器(https://www.tensorflow.org/api_docs/python/tf/train/AdamOptimizer),然后继续使用Nestrov(https://www.tensorflow.org/api_docs/python/tf/train/MomentumOptimizer)的动量
如果没有特征工程,仅使用线性回归就很难实现图像分类。此外,由于softmax旨在平滑argmax,因此您无需对结果运行softmax。最后,您应该将(None,num_features)输入到占位符形状中,以具有可变的批处理大小。这样,您就可以直接将有效数据集和测试数据集输入feed_dict中,而不必创建其他张量。