火车精度随着火车损失而降低

时间:2020-10-16 10:17:38

标签: python tensorflow machine-learning keras neural-network

我写了这个非常简单的代码

model = keras.models.Sequential()
model.add(layers.Dense(13000, input_dim=X_train.shape[1], activation='relu', trainable=False))
model.add(layers.Dense(1, input_dim=13000, activation='linear'))
model.compile(loss="binary_crossentropy", optimizer='adam', metrics=["accuracy"])

model.fit(X_train, y_train, batch_size=X_train.shape[0], epochs=1000000, verbose=1)

数据为MNIST,但仅适用于数字“ 0”和“ 1”。 我遇到了一个非常奇怪的问题,即损耗正像预期的那样单调减少到零,而准确度却没有提高,而是降低了。

这是示例输出

12665/12665 [==============================] - 0s 11us/step - loss: 0.0107 - accuracy: 0.2355
Epoch 181/1000000

12665/12665 [==============================] - 0s 11us/step - loss: 0.0114 - accuracy: 0.2568
Epoch 182/1000000

12665/12665 [==============================] - 0s 11us/step - loss: 0.0128 - accuracy: 0.2726
Epoch 183/1000000

12665/12665 [==============================] - 0s 11us/step - loss: 0.0133 - accuracy: 0.2839
Epoch 184/1000000

12665/12665 [==============================] - 0s 11us/step - loss: 0.0134 - accuracy: 0.2887
Epoch 185/1000000

12665/12665 [==============================] - 0s 11us/step - loss: 0.0110 - accuracy: 0.2842
Epoch 186/1000000

12665/12665 [==============================] - 0s 11us/step - loss: 0.0101 - accuracy: 0.2722
Epoch 187/1000000

12665/12665 [==============================] - 0s 11us/step - loss: 0.0094 - accuracy: 0.2583

由于我们只有两类,因此最低准确性的基准应该是0.5,此外,我们正在监视训练集的准确性,因此它应该非常高地达到100%,我期望过度拟合,而我根据损失函数。

在最后一个时期,就是这种情况

12665/12665 [==============================] - 0s 11us/step - loss: 9.9710e-06 - accuracy: 0.0758
如果您随机猜出的最坏理论可能性是50%,则7%的准确性。这不是偶然的。发生了什么事。

有人可以看到问题吗?

完整代码

from tensorflow import keras
import numpy as np
from matplotlib import pyplot as plt
import keras
from keras.callbacks import Callback
from keras import layers
import warnings

class EarlyStoppingByLossVal(Callback):
    def __init__(self, monitor='val_loss', value=0.00001, verbose=0):
        super(Callback, self).__init__()
        self.monitor = monitor
        self.value = value
        self.verbose = verbose

    def on_epoch_end(self, epoch, logs={}):
        current = logs.get(self.monitor)
        if current is None:
            warnings.warn("Early stopping requires %s available!" % self.monitor, RuntimeWarning)

        if current < self.value:
            if self.verbose > 0:
                print("Epoch %05d: early stopping THR" % epoch)
            self.model.stop_training = True

def load_mnist():

    mnist = keras.datasets.mnist
    (train_images, train_labels), (test_images, test_labels) = mnist.load_data()


    train_images = np.reshape(train_images, (train_images.shape[0], train_images.shape[1] * train_images.shape[2]))
    test_images = np.reshape(test_images, (test_images.shape[0], test_images.shape[1] * test_images.shape[2]))
    train_labels = np.reshape(train_labels, (train_labels.shape[0],))
    test_labels = np.reshape(test_labels, (test_labels.shape[0],))

    train_images = train_images[(train_labels == 0) | (train_labels == 1)]
    test_images = test_images[(test_labels == 0) | (test_labels == 1)]

    train_labels = train_labels[(train_labels == 0) | (train_labels == 1)]
    test_labels = test_labels[(test_labels == 0) | (test_labels == 1)]
    train_images, test_images = train_images / 255, test_images / 255

    return train_images, train_labels, test_images, test_labels



X_train, y_train, X_test, y_test = load_mnist()
train_acc = []
train_errors = []
test_acc = []
test_errors = []

width_list = [13000]
for width in width_list:
    print(width)

    model = keras.models.Sequential()
    model.add(layers.Dense(width, input_dim=X_train.shape[1], activation='relu', trainable=False))
    model.add(layers.Dense(1, input_dim=width, activation='linear'))
    model.compile(loss="binary_crossentropy", optimizer='adam', metrics=["accuracy"])

    callbacks = [EarlyStoppingByLossVal(monitor='loss', value=0.00001, verbose=1)]
    model.fit(X_train, y_train, batch_size=X_train.shape[0], epochs=1000000, verbose=1, callbacks=callbacks)


    train_errors.append(model.evaluate(X_train, y_train)[0])
    test_errors.append(model.evaluate(X_test, y_test)[0])
    train_acc.append(model.evaluate(X_train, y_train)[1])
    test_acc.append(model.evaluate(X_test, y_test)[1])


plt.plot(width_list, train_errors, marker='D')
plt.xlabel("width")
plt.ylabel("train loss")
plt.show()
plt.plot(width_list, test_errors, marker='D')
plt.xlabel("width")
plt.ylabel("test loss")
plt.show()
plt.plot(width_list, train_acc, marker='D')
plt.xlabel("width")
plt.ylabel("train acc")
plt.show()
plt.plot(width_list, test_acc, marker='D')
plt.xlabel("width")
plt.ylabel("test acc")
plt.show()

1 个答案:

答案 0 :(得分:1)

(二进制)分类问题在最后一层的线性激活是毫无意义的;将最后一层更改为:

model.add(layers.Dense(1, input_dim=width, activation='sigmoid'))

最后一层的线性激活用于回归问题,而不用于分类问题。