Copying and pasting code from tensorflow's MNIST tutorial works just fine, resulting in a ~92% accuracy, as expected.
When I read MNIST data as a CSV, and convert to an np array using pd.DataFrame.values, this process breaks down. I get a ~10% (no better than random) accuracy from this.
Below is the code (tutorial code works well, my CSV reader fails to learn):
Working MNIST tutorial:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x, W) + b)
y_ = tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
for i in range(1000):
batch_xs, batch_ys = mnist.train.next_batch(100)
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))
Not working (read CSV and feed np array):
import pandas as pd
from sklearn.cross_validation import train_test_split
import numpy as np
# read csv file
MNIST = pd.read_csv("/data.csv")
# pop label column and create training label array
train_label = MNIST.pop("label")
# converts from dataframe to np array
MNIST=MNIST.values
# convert train labels to one hots
train_labels = pd.get_dummies(train_label)
# make np array
train_labels = train_labels.values
x_train,x_test,y_train,y_test = train_test_split(MNIST,train_labels,test_size=0.2)
# we now have features (x_train) and y values, separated into test and train
# convert to dtype float 32
x_train,x_test,y_train,y_test = np.array(x_train,dtype='float32'), np.array(x_test,dtype='float32'),np.array(y_train,dtype='float32'),np.array(y_test,dtype='float32')
x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x, W) + b)
y_ = tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
def get_mini_batch(x,y):
# choose 100 random row values
rows=np.random.choice(x.shape[0], 100)
# return arrays of 100 random rows (for features and labels)
return x[rows], y[rows]
# train
for i in range(100):
# get mini batch
a,b=get_mini_batch(x_train,y_train)
# run train step, feeding arrays of 100 rows each time
sess.run(train_step, feed_dict={x: a, y_: b})
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: x_test, y_: y_test}))
Help would be greatly appreciated. (CSV file here.)
答案 0 :(得分:0)
Did you try training it for more iterations? I see that the original code is training over 1000 iterations
for i in range(1000):
Whereas the csv code only trains for 100 iterations:
for i in range(100):
If that's not the reason, it would be helpful if you could also share your CSV file, than we can easily test your code.
Edit:
I have tested your code and it seems to be caused by numerical instabilities in the simple cross_entropy
calculation (see this SO question). Replacing your cross_entropy
definition by the following line, you be able to resolve the issue:
cross_entropy = tf.reduce_mean(tf.reduce_sum(tf.nn.softmax_cross_entropy_with_logits(
y, y_, name='xentropy')))
By also visualizing the returned cross_entropy, you will see that your code returns NaN, whereas with this code you will get real numbers...
The complete working code which also prints out the cross_entropy per iteration:
import pandas as pd
from sklearn.cross_validation import train_test_split
import numpy as np
# read csv file
MNIST = pd.read_csv("data.csv")
# pop label column and create training label array
train_label = MNIST.pop("label")
# converts from dataframe to np array
MNIST=MNIST.values
# convert train labels to one hots
train_labels = pd.get_dummies(train_label)
# make np array
train_labels = train_labels.values
x_train,x_test,y_train,y_test = train_test_split(MNIST,train_labels,test_size=0.2)
# we now have features (x_train) and y values, separated into test and train
# convert to dtype float 32
x_train,x_test,y_train,y_test = np.array(x_train,dtype='float32'), np.array(x_test,dtype='float32'),np.array(y_train,dtype='float32'),np.array(y_test,dtype='float32')
x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x, W) + b)
y_ = tf.placeholder(tf.float32, [None, 10])
print y.get_shape()
print y_.get_shape()
cross_entropy = tf.reduce_mean(tf.reduce_sum(tf.nn.softmax_cross_entropy_with_logits(y, y_, name='xentropy')))
train_step = tf.train.GradientDescentOptimizer(0.0001).minimize(cross_entropy)
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
def get_mini_batch(x,y):
# choose 100 random row values
rows=np.random.choice(x.shape[0], 100)
# return arrays of 100 random rows (for features and labels)
return x[rows], y[rows]
# train
for i in range(1000):
# get mini batch
a,b=get_mini_batch(x_train,y_train)
# run train step, feeding arrays of 100 rows each time
_, cost =sess.run([train_step,cross_entropy], feed_dict={x: a, y_: b})
print cost
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: x_test, y_: y_test}))
You still need to optimize the learning rate and the #iterations further, but with this setting you should already get ~70% accuracy.
答案 1 :(得分:0)
我非常确定批次不应该是100个随机行,但应该是100行相互跟随,例如,0:99和100:199将是您的前两批。尝试使用此代码进行批处理。检查此kernel以获取来自TF中csv的Mnist的培训
epochs_completed = 0
index_in_epoch = 0
num_examples = train_images.shape[0]
# serve data by batches
def next_batch(batch_size):
global train_images
global train_labels
global index_in_epoch
global epochs_completed
start = index_in_epoch
index_in_epoch += batch_size
# when all trainig data have been already used, it is reorder randomly
if index_in_epoch > num_examples:
# finished epoch
epochs_completed += 1
# shuffle the data
perm = np.arange(num_examples)
np.random.shuffle(perm)
train_images = train_images[perm]
train_labels = train_labels[perm]
# start next epoch
start = 0
index_in_epoch = batch_size
assert batch_size <= num_examples
end = index_in_epoch
return train_images[start:end], train_labels[start:end]