Question

我是一名计算机科学老师，目前正在开设深度学习入门课程。 Python和Keras框架是我的首选工具。

我希望通过训练一些预定义2D数据日益复杂的模型来向学生展示过度拟合，就像this example一样。

同样的想法出现在Andrew Ng＆＃39; course on neural networks tuning的编程活动中。

然而，无论我怎么努力，我都无法用Keras复制这种行为。使用相同的数据集和超参数，决策边界总是更平滑＆＃34;并且模型永远不适合数据集中的噪声点。请参阅下面的结果和click here以浏览相关代码。这是相关的摘录：

# Varying the hidden layer size to observe underfitting and overfitting
plt.figure(figsize=(16, 32))
hidden_layer_dimensions = [1, 2, 3, 4, 5, 20, 50]
for i, hidden_layer_size in enumerate(hidden_layer_dimensions):
    fig = plt.subplot(4, 2, i+1)
    plt.title('Hidden Layer size: {:d}'.format(hidden_layer_size))

    model = Sequential()
    model.add(Dense(hidden_layer_size, activation='tanh', input_shape=(2,)))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(SGD(lr=1.0), 'binary_crossentropy', metrics=['accuracy'])
    history = model.fit(data, targets, verbose=0, epochs=50)

    plot_decision_boundary(lambda x: model.predict(x) > 0.5, data, targets, fig)

我做错了吗？ Keras是否有一些内部优化机制？我可以使用其他编译选项来缓解它们吗？

Answer 1

您的问题是您的所有示例都是不同大小的单层神经网络！如果你打印权重，你会注意到当你增加图层的大小（例如从5到50）后，其他神经元（例如45个神经元）的权重接近零，所以它们是相同的。

您已经增加了神经网络的深度以查看过度拟合。例如我改变了你的代码，前两个例子是单层NN，第三个（[30,30,30,30]）是四层NN（完整源代码是here）：

# Generate moon-shaped data with less samples and more noise
# data, targets = make_moons(500, noise=0.45)
from sklearn.datasets import make_moons, make_classification

data, targets =  make_classification(n_samples = 200, n_features=2, n_redundant=0, n_informative=2,
                           random_state=2, n_clusters_per_class=2)
plot_data(data, targets)
plt.figure(figsize=(16, 32))
hidden_layer_dimensions = [[2], [20], [30, 30, 30, 30]]

for i, hidden_layer_sizes in enumerate(hidden_layer_dimensions):
    fig = plt.subplot(4, 2, i+1)
    plt.title('Hidden Layer size: {}'.format(str(hidden_layer_sizes)))
    model = Sequential()
    for j, layer_size in enumerate(hidden_layer_sizes):
      if j == 0:
        model.add(Dense(layer_size, activation='tanh', input_shape=(2,)))
      else:
        model.add(Dense(layer_size, activation='tanh'))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(SGD(lr=0.1), 'binary_crossentropy', metrics=['accuracy'])
    history = model.fit(data, targets, verbose=0, epochs=500)
    plot_decision_boundary(lambda x: model.predict(x) > 0.5, data, targets, fig)

并且结果如下：

您也可以使用Tensorflow Playground实现目标。请检查一下！它有一个很好的交互式UI

Answer 2

你也可以增加时代数，并使用'relu'作为激活层，以获得锋利的边缘，如Andrew Ng。我在Colaboratory下使用50层神经元的1层网络运行你的笔记本，并为你的卫星添加噪音，以获得单独的彩色区域。请看一下，不要忘记激活GPU（exécution/ modifier le type d'exécution）。

# Varying the hidden layer size to observe underfitting and overfitting
plt.figure(figsize=(16, 32))
hidden_layer_dimensions = [50]
for i, hidden_layer_size in enumerate(hidden_layer_dimensions):
    fig = plt.subplot(4, 2, i+1)
    plt.title('Hidden Layer size: {:d}'.format(hidden_layer_size))

    model = Sequential()
    model.add(Dense(hidden_layer_size, activation='relu', input_shape=(2,)))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(SGD(lr=1.0), 'binary_crossentropy', metrics=['accuracy'])
    history = model.fit(data, targets, verbose=0, epochs=5000)

    plot_decision_boundary(lambda x: model.predict(x) > 0.5, data, targets, fig)

5000 epochs + relu (looks like what you want)

5000 epochs + tanh (tanh smoothes too much the curve for you)

Answer 3

我终于设法通过显着增加梯度下降和参数更新的数量来获得对我的数据的过度拟合。它兼容tanh和ReLU激活功能。

这是更新的行：

history = model.fit(x_train, y_train, verbose=0, epochs=5000, batch_size=200)

完整代码为here，并提供以下结果。

使用Keras在2D数据上展示过度拟合

3 个答案: