Question

我正在尝试使用Keras复制Neural Networks and Deep Learning中的一些示例，但是我在第1章中基于架构训练网络时遇到了问题。目的是对来自MNIST数据集的书写数字进行分类。架构：

784个输入（MNIST图像中28 * 28像素各一个）
30个神经元的隐藏层
10个神经元的输出层
权重和偏差从高斯分布初始化，均值为0，标准差为1。
损失/成本函数是均方误差。
优化器是随机梯度下降。

超参数：

学习率= 3.0
批量大小= 10
epochs = 30

我的代码：

from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD
from keras.initializers import RandomNormal


# import data
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# input image dimensions
img_rows, img_cols = 28, 28

x_train = x_train.reshape(x_train.shape[0], img_rows * img_cols)
x_test = x_test.reshape(x_test.shape[0], img_rows * img_cols)
input_shape = (img_rows * img_cols,)

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
num_classes = 10
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
print('y_train shape:', y_train.shape)

# Construct model
# 784 * 30 * 10
# Normal distribution for weights/biases
# Stochastic Gradient Descent optimizer
# Mean squared error loss (cost function)
model = Sequential()
layer1 = Dense(30,
               input_shape=input_shape,
               kernel_initializer=RandomNormal(stddev=1),
               bias_initializer=RandomNormal(stddev=1))
model.add(layer1)
layer2 = Dense(10,
               kernel_initializer=RandomNormal(stddev=1),
               bias_initializer=RandomNormal(stddev=1))
model.add(layer2)
print('Layer 1 input shape: ', layer1.input_shape)
print('Layer 1 output shape: ', layer1.output_shape)
print('Layer 2 input shape: ', layer2.input_shape)
print('Layer 2 output shape: ', layer2.output_shape)

model.summary()
model.compile(optimizer=SGD(lr=3.0),
              loss='mean_squared_error',
              metrics=['accuracy'])

# Train 
model.fit(x_train,
          y_train,
          batch_size=10,
          epochs=30,
          verbose=2)

# Run on test data and output results
result = model.evaluate(x_test,
                        y_test,
                        verbose=1)
print('Test loss: ', result[0])
print('Test accuracy: ', result[1])

输出（使用Python 3.6和TensorFlow后端）：

Using TensorFlow backend.
x_train shape: (60000, 784)
60000 train samples
10000 test samples
y_train shape: (60000, 10)
Layer 1 input shape:  (None, 784)
Layer 1 output shape:  (None, 30)
Layer 2 input shape:  (None, 30)
Layer 2 output shape:  (None, 10)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_1 (Dense)              (None, 30)                23550     
_________________________________________________________________
dense_2 (Dense)              (None, 10)                310       
=================================================================
Total params: 23,860
Trainable params: 23,860
Non-trainable params: 0
_________________________________________________________________
Epoch 1/30
 - 7s - loss: nan - acc: 0.0987
Epoch 2/30
 - 7s - loss: nan - acc: 0.0987

（重复所有30个时期）

Epoch 30/30
 - 6s - loss: nan - acc: 0.0987
10000/10000 [==============================] - 0s 22us/step
Test loss:  nan
Test accuracy:  0.098

正如您所看到的，网络根本就没有学习，我不知道为什么。据我所知，形状看起来很好。我在做什么阻止网络学习？

（顺便说一句，我知道交叉熵损失和softmax输出层会更好;但是，从链接的书中看，它们似乎并不是必需的。本书第1章中手动实现的网络学习成功; I在继续前进之前，我试图复制它。）

Answer 1

您需要指定每个图层的激活。所以对于每一层。应该是这样的：

layer2 = Dense(10,
           activation='sigmoid',
           kernel_initializer=RandomNormal(stddev=1),
           bias_initializer=RandomNormal(stddev=1))

注意我在这里指定了激活参数。同样对于最后一层，您应该使用activation="softmax"，因为您有多个类别。

另一件需要考虑的事情是，分类（与回归相反）对熵损失最有效。因此，您可能希望将model.compile中的损失值更改为loss='categorical_crossentropy'。但是，这不是必需的，您仍然可以使用mean_square_error丢失获得结果。

如果您仍然获得损失的nan值，则可以尝试更改SGD的学习率。

我使用您显示的脚本通过仅将第一个图层的激活更改为0.9425，将第二个图层的激活更改为sigmoid来获得softmax的测试结果。

Answer 2

选择MSE作为分类问题中的损失函数确实很奇怪，我不确定该练习的介绍性质是一个很好的理由，如链接的书中所述。尽管如此：

您的学习成绩lr，3.0，巨大;尝试至少0.1，甚至更低的东西。
原样，你的图层完全缺乏任何激活功能;尝试在所有图层添加activation='sigmoid'（因为您明确要避免softmax，即使在最后一层也是如此。）
您在初始值设定项中使用的stddev=1值再次 huge ;尝试0.05（default value）范围内的事情。此外，standard practice是将偏见初始化为零。

最好从Keras MNIST MLP example开始，并根据您的学习需求（关于层数，激活功能等）进行调整。

简单的Keras神经网络不是学习

2 个答案: