更新

Question

我刚开始研究tensorflow，我想为MNIST创建一个DNN。在本教程中，有一个非常简单的神经网络，有784个输入节点，10个输出节点，没有隐藏节点。我尝试修改这些代码以创建DNN网络。这是我的代码。我想我只是在输入和输出层之间添加一个有500个节点的隐藏层，但测试精度只有10％，这意味着它没有经过训练。你知道我的代码有什么问题吗？

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

import os
os.chdir('../')

mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

x=tf.placeholder(tf.float32,[None,784])
W_h1=tf.Variable(tf.zeros([784,500]))
B_h1=tf.Variable(tf.zeros([500]))
h1=tf.nn.relu(tf.matmul(x,W_h1)+B_h1)
'''
W_h2=tf.Variable(tf.zeros([5,5]))
B_h2=tf.Variable(tf.zeros([5]))
h2=tf.nn.relu(tf.matmul(h1,W_h2)+B_h2)
'''
B_o=tf.Variable(tf.zeros([10]))
W_o=tf.Variable(tf.zeros([500,10]))
y=tf.nn.relu(tf.matmul(h1,W_o)+B_o)

y_=tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))

train_step = tf.train.GradientDescentOptimizer(0.05).minimize(cross_entropy)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()

number_steps = 10000
batch_size = 100
for _ in range(number_steps):
  batch_xs, batch_ys = mnist.train.next_batch(batch_size)
  train=sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

  # Print classifier's accuracy
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

好的，根据@ lejlot的建议，我将我的代码更改为以下内容。

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

import os
os.chdir('../')

mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

x=tf.placeholder(tf.float32,[None,784])
W_h1=tf.Variable(tf.random_normal([784,500]))
B_h1=tf.Variable(tf.random_normal([500]))
h1=tf.nn.relu(tf.matmul(x,W_h1)+B_h1)
'''
W_h2=tf.Variable(tf.random_normal([500,500]))
B_h2=tf.Variable(tf.random_normal([500]))
h2=tf.nn.relu(tf.matmul(h1,W_h2)+B_h2)
'''
B_o=tf.Variable(tf.random_normal([10]))
W_o=tf.Variable(tf.random_normal([500,10]))
y= tf.matmul(h1,W_o)+B_o # notice no activation

y_=tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.nn.log_softmax(y), # notice log_softmax
                  reduction_indices=[1]))

train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()

correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

number_steps = 10000
batch_size = 100
for i in range(number_steps):
  batch_xs, batch_ys = mnist.train.next_batch(batch_size)
  train=sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
  if i % 1000==0:
    acc=sess.run(accuracy,feed_dict={x: mnist.test.images, y_: mnist.test.labels})
    print('Current loop %d, Accuracy: %g'%(i,acc))



  # Print classifier's accuracy
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

有两个修改：

用tf.random_normal改变W_h1和B_h1的初始值
更改y和cross_entropy的定义

修改剂量工作。但我仍然不知道我的原始代码有什么问题。我调用了tf.global_variables_initializer（）。run（），我认为这个函数将随机化W_h1和B_h1的值。此外，如果我将y和cross_entr定义如下，它就不起作用。

y= tf.nn.softmax(tf.matmul(h1,W_o)+B_o) 
y_=tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y),reduction_indices=[1]))

Answer 1

首先，这不是有效的分类器模型。

y=tf.nn.relu(tf.matmul(h1,W_o)+B_o)

y_=tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))

您正在使用显式方程式进行交叉熵，这需要y为（行方式）概率分布，但您通过应用relu生成y，这意味着您只是输出一些非负数。事实上，如果你输出零，你的代码将产生NaN并失败（因为0的日志是负无穷大）。

你应该使用

y = tf.nn.softmax(tf.matmul(h1,W_o)+B_o)

代替。甚至更好（为了更好的数值稳定性）：

y= tf.matmul(h1,W_o)+B_o # notice no activation

y_=tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(
                  -tf.reduce_sum(y_ * tf.nn.log_softmax(y), # notice log_softmax
                  reduction_indices=[1]))

更新

第二个问题是初始化 - 你不能将神经网络权重初始化为零，它们必须是随机数，通常从低方差零均值高斯中采样。全局初始化不随机化权重，它只是运行所有初始化操作 - 如果初始化操作是常量操作（如零），它只是确保将这些零分配给变量，没有别的（因此它可用于重置网络等）。零初始化仅适用于凸问题，例如逻辑回归，但不适用于像神经网络这样的复杂模型。

我的张量流代码有什么问题

1 个答案:

更新