我读过迈克尔尼尔森关于神经网络的神经网络和学习网的书。他总是以MNIST数据为例。 我现在拿走了他的代码并在Tensorflow中设计了完全相同的网络,但我意识到Tensorflow中的结果并不相同(它们更糟糕)。
以下是详细信息:
1)Michael Nielsen的代码可以在https://github.com/kanban1992/MNIST_Comparison/tree/master/Michael_Nielsen找到。 你可以用
开始一切python start_2.py
网络
这种方法必须正确,因为它运作良好,我没有修改它!
2)张量流实现由我完成,并且与上面第1点中描述的Nielsen网具有完全相同的结构。 完整代码可以在https://github.com/kanban1992/MNIST_Comparison/tree/master/tensorflow找到并与
一起运行python start_train.py
使用张量流方法,我得到10%的准确度(这与随机猜测相同!)所以有些东西不起作用,我不知道是什么!?
以下是代码中最重要部分的摘录:
x_training,y_training,x_validation,y_validation,x_test,y_test = mnist_loader.load_data_wrapper()
N_training=len(x_training)
N_validation=len(x_validation)
N_test=len(x_test)
N_epochs = 5
learning_rate = 3.0
batch_size = 10
N1 = 784 #equals N_inputs
N2 = 30
N3 = 30
N4 = 30
N5 = 10
N_in=N1
N_out=N5
x = tf.placeholder(tf.float32,[None,N1])#don't take the shape=(batch_size,N1) argument, because we need this for different batch sizes
W2 = tf.Variable(tf.random_normal([N1, N2],mean=0.0,stddev=1.0/math.sqrt(N1*1.0)))# Initialize the weights for one neuron with 1/sqrt(Number of weights which enter the neuron/ Number of neurons in layer before)
b2 = tf.Variable(tf.random_normal([N2]))
a2 = tf.sigmoid(tf.matmul(x, W2) + b2) #x=a1
W3 = tf.Variable(tf.random_normal([N2, N3],mean=0.0,stddev=1.0/math.sqrt(N2*1.0)))
b3 = tf.Variable(tf.random_normal([N3]))
a3 = tf.sigmoid(tf.matmul(a2, W3) + b3)
W4 = tf.Variable(tf.random_normal([N3, N4],mean=0.0,stddev=1.0/math.sqrt(N3*1.0)))
b4 = tf.Variable(tf.random_normal([N4]))
a4 = tf.sigmoid(tf.matmul(a3, W4) + b4)
W5 = tf.Variable(tf.random_normal([N4, N5],mean=0.0,stddev=1.0/math.sqrt(N4*1.0)))
b5 = tf.Variable(tf.random_normal([N5]))
y = tf.sigmoid(tf.matmul(a4, W5) + b5)
y_ = tf.placeholder(tf.float32,[None,N_out]) # ,shape=(batch_size,N_out)
quadratic_cost= tf.scalar_mul(1.0/(N_training*2.0),tf.reduce_sum(tf.squared_difference(y,y_)))
train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(quadratic_cost)
init = tf.initialize_all_variables()
#launch the graph
sess = tf.Session()
sess.run(init)
#batch size of training input
N_training_batch=N_training/batch_size #rounds to samllest integer
correct=[0]*N_epochs
cost_training_data=[0.0]*N_epochs
for i in range(0,N_epochs):
for j in range(0,N_training_batch):
start=j*batch_size
end=(j+1)*batch_size
batch_x=x_training[start:end]
batch_y=y_training[start:end]
sess.run(train_step, feed_dict={x: batch_x,
y_: batch_y})
perm = np.arange(N_training)
np.random.shuffle(perm)
x_training = x_training[perm]
y_training = y_training[perm]
#cost after each epoch
cost_training_data[i]=sess.run(quadratic_cost, feed_dict={x: x_training,
y_: y_training})
#correct predictions after each epoch
y_out_validation=sess.run(y,feed_dict={x: x_validation})
for k in range(0,len(y_out_validation)):
arg=np.argmax(y_out_validation[k])
if 1.0==y_validation[k][arg]:
correct[i]+=1
print "correct after "+str(i)+ " epochs: "+str(correct[i])
如果你能告诉我出了什么问题,真的很棒: - )
答案 0 :(得分:1)
Gradient Decent的学习率似乎很高。尝试更像.0001的数字。从那里升起或降低。
我喜欢Adam优化器,请确保以较小的学习率开始(.001我认为是Adam的默认值):
optimizer = tf.train.AdamOptimizer(learning_rate)