我正在使用张量流为MNIST数据集复制神经网络,之前已在skflow中编程。这是skflow中的模型:
import tensorflow.contrib.learn as skflow
from sklearn import metrics
from sklearn.datasets import fetch_mldata
from sklearn.cross_validation import train_test_split
mnist = fetch_mldata('MNIST original')
train_dataset, test_dataset, train_labels, test_labels = train_test_split( mnist.data, mnist.target, test_size=10000, random_state=42)
classifier = skflow.TensorFlowDNNClassifier(hidden_units=[1200, 1200], n_classes=10, optimizer="SGD", learning_rate=0.01, batch_size=128, steps=1000)
classifier.fit(train_dataset, train_labels)
score = metrics.accuracy_score(test_labels, classifier.predict(test_dataset))
print("Accuracy: %f" % score)
此模型的准确度为0.950600。
但是,在tensorflow中复制的模型在损失函数中得到了纳米并且未能改进(我认为它与Tensorflow NaN bug?无关,因为我使用的是tf.nn.softmax_cross_entropy_with_logits)。
我无法弄清楚为什么,因为张量流中的模型设置与skflow中的模型相同。我唯一不确定它是否相同,是关于skflow如何初始化网络的权重,我在skflow的代码中搜索了那部分,但我还没有找到它。
这是tensorflow中的代码:
import numpy as np
import tensorflow as tf
from sklearn.cross_validation import train_test_split
from sklearn.datasets import fetch_mldata
mnist = fetch_mldata('MNIST original')
num_labels = len(np.unique(mnist.target))
num_pixels = mnist.data.shape[1]
#reshape labels to one hot encoding
labels = (np.arange(num_labels) == mnist.target[:, None]).astype(np.float32)
#create train_dataset of 60000 and test_dataset of 10000 elem
train_dataset, test_dataset, train_labels, test_labels = train_test_split(mnist.data, labels, test_size=10000, random_state=42)
def accuracy(predictions, labels):
return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1)) / predictions.shape[0])
batch_size = 128
graph = tf.Graph()
with graph.as_default():
# Input data.
tf_train_dataset = tf.placeholder(tf.float32,
shape=(batch_size, num_pixels))
tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
tf_test_dataset = tf.cast(tf.constant(test_dataset), tf.float32)
w_hidden = tf.Variable(tf.truncated_normal([num_pixels, 1200]))
b_hidden = tf.Variable(tf.zeros([1200]))
hidden = tf.nn.relu(tf.matmul(tf_train_dataset, w_hidden) + b_hidden)
w_hidden_2 = tf.Variable(tf.truncated_normal([1200, 1200]))
b_hidden_2 = tf.Variable(tf.zeros([1200]))
hidden2 = tf.nn.relu(tf.matmul(hidden, w_hidden_2) + b_hidden_2)
w = tf.Variable(tf.truncated_normal([1200, num_labels]))
b = tf.Variable(tf.zeros([num_labels]))
logits = tf.matmul(hidden2, w) + b
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
logits, tf_train_labels))
# Optimizer.
optimizer = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
# Predictions for the training, and test data.
train_prediction = tf.nn.softmax(logits)
test_prediction = tf.nn.softmax(tf.matmul(tf.nn.relu(tf.matmul(tf.nn.relu(tf.matmul(tf_test_dataset, w_hidden) + b_hidden), w_hidden_2) + b_hidden_2), w) + b)
num_steps = 1001
with tf.Session(graph=graph) as session:
tf.initialize_all_variables().run()
print("Initialized")
for step in range(num_steps):
# Pick an offset within the training data, which has been randomized.
offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
# Generate a minibatch.
batch_data = train_dataset[offset:(offset + batch_size), :]
batch_labels = train_labels[offset:(offset + batch_size), :]
# Prepare a dictionary telling the session where to feed the minibatch.
feed_dict = {tf_train_dataset: batch_data, tf_train_labels: batch_labels}
_, l, predictions = session.run( [optimizer, loss, train_prediction], feed_dict=feed_dict)
if (step % 100 == 0):
print("Minibatch loss at step %d: %f" % (step, l))
print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))
print("Test accuracy: %.1f%%" % accuracy(test_prediction.eval(), test_labels))
我对可能出现的问题一无所知。有什么建议吗?
编辑1:正如我的建议,我尝试用tf.get_variable(“w_hidden”,[num_pixels,1200])替换tf.Variable调用,但我得到了Nans。
另外,我使用skflow.ops.dnn操作来做图层并使用我自己的损失等,但仍然有Nans。
编辑2:原来这不是重量初始化的问题。似乎梯度太不稳定(在张量流模型中)并导致损失变为NaN。与Adding multiple layers to TensorFlow causes loss function to become Nan一样,我将学习速度降低了一个数量级,并且结果很明显。
现在我不明白skflow的SGD优化器与上面的优化器之间的区别。或者他们“看起来”平等的解释是什么,但他们需要不同的学习率?
答案 0 :(得分:0)
skflow中的初始化依赖于tf.get_variable
默认初始化 - uniform_unit_scaling_initializer
(有关详细说明,请参阅this)。
您可以尝试使用tf.Variable
之类的内容替换tf.get_variable("w_hidden", [num_pixels, 1200])
来电。
替代方案,首先是使用skflow.ops.dnn
op来为你做层,但你仍然会自己做损失等等。
另外请告诉我,如果你有一个明确的用例强迫你用纯TensorFlow重写东西而不是使用skflow - 我很想解决它。您始终可以通过将model_fn
传递到TensorFlowEstimator
来编写自定义模型,并且仍然使用培训/批处理/保存等功能。